Skip to content

test: stabilize flaky CI tests#3606

Draft
dkrizan wants to merge 4 commits intomainfrom
dkrizan/fix-flaky-ci-tests
Draft

test: stabilize flaky CI tests#3606
dkrizan wants to merge 4 commits intomainfrom
dkrizan/fix-flaky-ci-tests

Conversation

@dkrizan
Copy link
Copy Markdown
Member

@dkrizan dkrizan commented Apr 15, 2026

Summary

Stabilizes flaky tests reported in CI. Draft because more flaky-test fixes may be added here.

Current contents — fixes for tests that failed in https://github.com/tolgee/billing/actions/runs/24438119615:

  • `KeyControllerKeySearchTest.search is not slow` — adds warmup queries before the timed retries (warms JIT, connection pool, Postgres planner) and relaxes the per-request threshold from 5000ms to 10000ms. CI runners legitimately exceed 5s under load.

  • `BatchJobsGeneralWithoutRedisTest.mt job respects maxPerJobConcurrency` — `assertMaxPerJobConcurrencyIsLessThanOrEqualTo` now counts only this job's own running executions (was counting all of `runningJobs`, which can include stray executions from other jobs in the shared Spring context), tracks the maximum observed across polls, and waits for both queue drain and zero running for this job before asserting. The previous version asserted on a transient instant value, and `waitFor` does not catch `AssertionError`, so any single observation above the cap failed the test.

  • `WebsocketWithRedisTest.notifies on key modification` — bumps the pre-dispatch sleep in `assertNotified` from 200ms to 1000ms. Under CI load (especially with the Redis broker relay), the STOMP SUBSCRIBE frame was not always registered server-side when the dispatch fired, so the broadcast missed the not-yet-registered subscription. Also bumps the receive timeout from 3s to 10s.

Companion billing-side PR: https://github.com/tolgee/billing/pull/260

Test plan

  • `BatchJobsGeneralWithoutRedisTest.mt job respects maxPerJobConcurrency` passes locally
  • `WebsocketWithRedisTest.notifies on key modification` passes locally
  • CI green

KeyControllerKeySearchTest.search is not slow:
- Add warmup queries before the timed retries so JIT, connection pool
  and Postgres query planner are warm before measurement.
- Relax the per-request threshold from 5000ms to 10000ms — CI runners
  legitimately exceed 5s under load.

BatchJobsGeneralWithoutRedisTest.mt job respects maxPerJobConcurrency:
- assertMaxPerJobConcurrencyIsLessThanOrEqualTo now takes the job and
  counts only its own running executions, ignoring stray executions
  from other jobs in the shared Spring context.
- Track the maximum observed concurrency across polls and assert at
  the end, instead of failing on a transient instant value (waitFor
  did not catch AssertionErrors, so any single observation above the
  cap failed the test).
- Wait until both the queue is drained AND no executions are running
  for this job before asserting.

WebsocketWithRedisTest.notifies on key modification:
- Bump the pre-dispatch sleep in assertNotified from 200ms to 1000ms.
  Under CI load (especially with the Redis broker relay) the STOMP
  SUBSCRIBE frame was not yet processed server-side when the dispatch
  fired, so the broadcast missed the not-yet-registered subscription.
- Bump the receive timeout from 3s to 10s for additional CI tolerance.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 65ac746d-20ae-4ba8-b7be-56cd643f37a8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dkrizan/fix-flaky-ci-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

dkrizan added 3 commits April 15, 2026 16:17
KeyControllerKeySearchTest.search is not slow:
- The warmup queries added in the previous commit ran outside
  executeInNewTransaction and returned HTTP 500 because the entities
  set up by saveAndPrepare were detached from the persistence session
  by the time the request fired. Wrap each warmup call in
  executeInNewTransaction the same way the timed calls are.

WebsocketTestHelper.waitForAuthenticationStatus:
- Bump the timeout from 5s to 10s. doesn't subscribe as another user
  failed waiting for the FORBIDDEN frame under CI load — same class
  of CI tightness the other websocket fix addressed.
Replace exact string match of the CSV header with parsed assertions
that check fixed columns order, required language columns (cs, fr),
alphabetical sorting, and base language exclusion. This prevents
flaky failures when shared Spring context data adds extra languages.
The default 10s timeout is insufficient under CI load when waiting
for 110+ mock invocations through the Redis-relayed batch pipeline,
causing intermittent AssertionFailedError in the retries-and-fails
scenario of BatchJobsGeneralWithRedisTest.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant