Hunting down flaky tests with `--repeat-until-failure`

almirsarajcic

almirsarajcic

5 hours ago

Nothing destroys developer confidence like tests that randomly fail in CI. One day your build is green, the next it’s red with the exact same code. The culprit? Flaky tests that pass locally but fail unpredictably in different environments.

# Run a test up to 100 times or until it fails
mix test \
  test/elixir_drops_web/live/user_drop_live_test.exs:398 \
  --repeat-until-failure 100

This ExUnit flag is your secret weapon for reproducing intermittent failures. Instead of crossing your fingers and hoping the test fails again, you force it to reveal its flaky behavior.

Common flaky test patterns and fixes

The GenServer state race condition

# ❌ Flaky - race condition between async calls
test "updates counter state" do
  {:ok, pid} = Counter.start_link(0)

  # Fire off async increment
  Counter.increment(pid)

  # Immediately check state - race condition!
  assert Counter.get(pid) == 1  # Might still be 0
end

# ✅ Reliable - wait for the operation to complete
test "updates counter state" do
  {:ok, pid} = Counter.start_link(0)

  # Synchronous increment ensures completion
  :ok = Counter.increment_sync(pid)

  assert Counter.get(pid) == 1
end

The async operation timing issue

# ❌ Flaky - doesn't wait for async work
test "processes job asynchronously" do
  JobProcessor.process_async(job_id: 123)

  # Immediately check - job might not be done yet
  job = Repo.get!(Job, 123)
  assert job.status == :completed
end

# ✅ Reliable - wait for completion signal
test "processes job asynchronously" do
  JobProcessor.process_async(job_id: 123)

  # Wait for the completion message
  assert_receive {:job_completed, 123}, 1000

  job = Repo.get!(Job, 123)
  assert job.status == :completed
end

Prevention strategy

  1. Use synchronous operations when testing state changes
  2. Add explicit waits with assert_receive for async operations
  3. Avoid immediate assertions after firing async calls
  4. Test one behavior per test to isolate race conditions

Run --repeat-until-failure during development on any test involving GenServers, async operations, or message passing. Catch the flakiness before it reaches CI.