Test Coverage: The Honest Version

Contributor
Mar 13
5 min read

Test coverage is the most-cited and most-misunderstood testing metric. Teams pursue coverage targets, debate the "right" number, and feel disappointed when high coverage doesn't prevent bugs. The honest version: coverage tells you what code your tests executed, not what your tests verified. The distinction matters.

What Coverage Measures

Most coverage tools measure:

Line coverage: which lines were executed
Branch coverage: which conditional branches were taken
Function coverage: which functions were called
Statement coverage: which statements were executed

These tell you that code ran. They don't tell you whether tests asserted on the results.

A test that calls a function and never checks its output produces 100% function coverage for that function while testing nothing.

What Coverage Doesn't Measure

Whether assertions are meaningful. Coverage credits a test even if it asserts only expect(result).toBeTruthy().
Whether tests are correct. A test verifying wrong behavior still gives coverage.
Whether the right things were tested. Tests can hit 90% of lines and miss the critical 10%.
Whether tests would catch regressions. Mutation testing measures this; coverage doesn't.
Behavior coverage. Coverage is per-line; behavior is per-input-combination.

The Coverage-Worship Failure

A team commits to 80% coverage. Developers write tests to hit the number. Tests that exercise code without asserting meaningful behavior count toward coverage. The number rises; quality doesn't follow.

Six months later, production bugs continue. The team is confused: "We have 80% coverage." The number was the goal, but the number doesn't measure what they wanted.

The fix is to treat coverage as a leading indicator and a diagnostic tool, not a target.

What Coverage Is Good For

Despite the misuse, coverage has legitimate uses:

Finding untested code. If 30% of your file isn't executed by any test, that's information. Worth investigating.

Sanity-checking new tests. When you add tests for a feature, does coverage of the relevant files actually increase?

Detecting dead code. Code with no coverage and no usage may be removable.

Comparing over time. Trends in coverage (within the same codebase) signal whether testing is keeping up.

These are diagnostic uses. The number itself doesn't matter; what's revealed matters.

What Targets Often Get Wrong

The "right" coverage target is hotly debated. Common arguments:

"100%" is unrealistic. Some code is genuinely not worth testing.
"80%" is arbitrary. It works for some codebases, fails for others.
"0% target" undervalues testing.

A more useful framing: the target depends on the code.

Critical code (payment, security, data integrity): high coverage (90%+) with meaningful tests
Standard business logic: moderate coverage (70-85%)
Generated code, UI glue, boilerplate: low priority
Experimental code: may have zero coverage and that's fine

A flat coverage target across these categories produces over-investment in some, under-investment in others.

Branch Coverage vs. Line Coverage

Line coverage is the most common but least useful.

if condition:
    do_a()
else:
    do_b()

Both branches need to be tested for full behavior coverage. Line coverage may be satisfied by tests that only execute one branch (executing the if keyword counts).

Branch coverage requires both branches to be tested. Closer to behavior coverage.

If you're tracking coverage, branch is more meaningful than line.

Coverage as PR Gate

A common practice: block PRs that decrease coverage.

Trade-offs:

Good: prevents drift
Bad: encourages low-quality tests to maintain the number

A better gate: require that new code has meaningful tests, evaluated in review, not just that the number didn't drop.

If you must use coverage as a gate, focus on coverage of changed lines, not the overall percentage. New code should be tested; old uncovered code can stay uncovered while it's not changing.

Differential Coverage

Track coverage of changes specifically. A PR that adds 100 lines should typically test most of them.

Tools support this (delta coverage in many CI integrations). It's more useful than absolute coverage because it focuses on what's actively changing.

The Hard Cases

Code that's genuinely hard to cover:

Error paths that require triggering rare failures
Defensive code that may not be reachable in normal flow
Initialization code that runs only once
Concurrency edge cases

For each, coverage tools may flag the lines as uncovered while the code is intentional. Three responses:

Test it anyway. Sometimes the rare error path is reachable with test setup.
Document why it's uncovered. A comment explaining the case.
Remove if truly unreachable. Dead code shouldn't exist.

Don't add fake tests just to cover hard-to-test code. The coverage number rises; the suite doesn't actually verify anything.

Coverage Reports Beyond Numbers

The most useful part of coverage reports is the visualization — line-by-line showing what's covered and what isn't.

Use the report to:

Spot critical areas with no coverage
Find functions that look tested but actually aren't
Identify code paths that need attention
Verify new tests covered what they intended

The total percentage is the headline; the file-by-file detail is where insight lives.

Coverage and Test Quality

A theme: high coverage with weak tests is worse than moderate coverage with strong tests.

Stronger predictors of bug-catching than coverage:

Tests that fail when behavior breaks (not just when implementation changes)
Tests with meaningful assertions
Tests that verify edge cases, not just happy paths
Tests that the team trusts and runs

Coverage is part of the picture, not the picture.

Calibrating Coverage to Your Codebase

Different codebases need different coverage profiles:

Library code: high coverage justified (used by many)
Application code: moderate (used in one context, manually tested too)
Experimental code: low (may be discarded)
Generated code: typically not counted
UI styling: typically not counted

Configure coverage tools to exclude what shouldn't be counted, and to weight what should.

A Working Coverage Strategy

Practical recommendations:

Use coverage as a diagnostic, not a target. When it surprises you, investigate.
Track coverage of changes specifically. PRs should test what they change.
Aim for meaningful tests over high numbers. Quality first.
Focus coverage investment on critical code. Where bugs cost most.
Don't write tests to hit a number. Write tests that catch bugs.

When Coverage Is Hurting You

Warning signs:

Tests are getting written reluctantly to hit targets
Tests have weak assertions that exist only for coverage
The team is frustrated that "high coverage didn't prevent the bug"
Coverage discussions consume more time than test discussions

If these are happening, the coverage target is causing harm. Loosen the gate; refocus on test quality.

Coverage Compared to Mutation Score

Mutation testing measures whether tests would catch bugs (whereas coverage measures execution).

For critical code, mutation score is more meaningful than coverage. It's more expensive to run, but a single mutation-testing audit can reveal more about test quality than a year of coverage tracking.

Key Takeaway

Coverage measures execution, not verification. High coverage with weak tests is common and dangerous. Use coverage as a diagnostic — what's not tested? where are gaps? — rather than as a target. Track coverage of changed code specifically. Focus testing investment on critical code; accept lower coverage for boilerplate. Don't write tests to hit numbers; write tests that catch bugs. For deeper signal on test quality, mutation testing is more informative than coverage.

ShiftQuality