Tutorial 9: Quarantine Flaky Tests

Contributor
Apr 24
3 min read

A flaky test trains the team to ignore failures. Eventually real bugs hide behind perceived flakiness. This tutorial walks through the discipline that prevents that.

What You'll Build

A working quarantine process: detect, isolate, investigate, fix or delete.

Step 1: Detect Flakiness (15 min)

Three signals:

1. Different result on rerun. A test passes on retry. CI captures this.

2. Different result by order. Test passes alone, fails with others (or vice versa).

3. Different result by environment. Passes locally, fails in CI.

Tools that detect:

GitHub Actions: --retries=1 in your test command captures retry stats
Test reporters that aggregate per-test pass/fail history
Specialized: Datadog Tests, Buildkite Test Analytics

Without detection, flakiness is anecdotal. With it, you can act.

Step 2: Define the Quarantine Process (10 min)

## Quarantine Process

When a test is detected as flaky:

1. **Within 24 hours:** mark it as quarantined (skip in the blocking suite)
2. **Within 1 week:** assigned an owner who investigates
3. **Within 2 weeks:** fixed or deleted

No test lives in quarantine indefinitely.

Document this. Reference in your testing strategy.

Step 3: Mark Tests as Quarantined (10 min)

In your test framework:

Python (pytest):

import pytest

@pytest.mark.flaky
def test_intermittent_thing():
    ...

In pytest.ini:

[pytest]
markers =
    flaky: marks tests as flaky (deselect with '-m "not flaky"')

Run blocking suite as:

pytest -m "not flaky"

Run quarantined tests separately:

pytest -m flaky --no-fail

Vitest/Jest:

test.skip('flaky thing', async () => { ... });
// or with a tag system you set up

The marker pulls them from the blocking path. They still run (for visibility) but don't block.

Step 4: Track the Quarantine List (10 min)

A shared doc:

# Flaky Test Quarantine

| Test | Quarantined | Owner | Status | Notes |
|------|-------------|-------|--------|-------|
| test_x | 2026-06-15 | Sam | Investigating | Suspected race condition |
| test_y | 2026-06-20 | Pat | In progress | Reproducing locally |

Updated as part of the weekly testing review.

Step 5: Investigate One Flake (varies)

Pick the oldest quarantined test:

Reproduce locally. Run it 20 times. Does it flake?
If yes: add diagnostic logging; identify the cause.
If no: environment difference. Investigate CI-specific factors.

Common causes:

Timing: fixed sleep, async without proper waits
Order: shared state between tests
Concurrency: race on shared resource
External: real third-party API slowness
Resource: memory/connection exhaustion

For each cause, the fix is different. Tutorial-9 of the Hands-On Software Testing path covers debugging in depth.

Step 6: Fix or Delete (varies)

For each flake:

Fix if: the test covers important behavior and the cause is addressable.

Delete if:

Test covers behavior already covered elsewhere
Test no longer matches current product
Cost to fix exceeds value of the test
Behavior is no longer worth verifying

Don't keep flaky tests "in case they catch something." They don't catch bugs; they catch noise.

Step 7: Verify the Fix (5 min)

After fixing a flake:

# Run 20 times
for i in {1..20}; do pytest tests/test_x.py; done

All should pass. If any fails, the fix wasn't sufficient. Investigate more.

Then remove the quarantine marker.

Step 8: Track the Aggregate (ongoing)

Monthly:

How many tests in quarantine?
Average time in quarantine?
How many flakes detected this month?

If quarantine count keeps growing, the process is failing. Either fix rate is too slow or new flakes are appearing too fast.

Step 9: Watch for Patterns (ongoing)

When 5 flakes have a similar cause:

All async tests with the same dependency
All tests in a specific file
All tests against a specific external service

The pattern points at a systemic issue. Address it once, not 5 times.

Step 10: Cultural Discipline (ongoing)

The hardest part: keep the team honest.

Don't accept "the test is just flaky" as an excuse to retry
Don't let CI auto-retry hide real bugs
Don't add new tests that are flaky from day one
Celebrate fixed flakes

A team that tolerates flakiness has a suite that tolerates broken code. A team that doesn't, has a suite they trust.

What You Just Did

You set up the discipline that keeps the test suite trustworthy. Flaky tests get caught, quarantined, and resolved on a clock. The team doesn't drift toward "tests are advisory."

Common Failure Modes

Flake budget. "We accept 2%." Real bugs hide there.

Auto-retry as remedy. Tests pass on retry; no investigation. Bugs ship.

Quarantine graveyard. Tests pile up in quarantine; nobody works them. Set a deadline.

Deleting too easily. Important coverage lost because flake was too hard to fix. Investigate before deleting.

Cultural acceptance. "It's just flaky." Push back.

Next Tutorial

Final tutorial: keeping the whole suite useful over time: Tutorial 10: Maintain a Long-Lived Test Suite.

ShiftQuality