Integration Testing: When, How, and Why

Contributor
Mar 7
5 min read

Integration tests verify that two or more components work together correctly. They sit between unit tests (which test one component in isolation) and end-to-end tests (which exercise the entire system). They're often the most valuable layer of the test pyramid and the most underused.

What Counts as Integration

There's no single definition. Common scopes:

Module integration: two or more classes in the same codebase working together
Service-to-database: code working with the real database
Service-to-service: two services communicating
Application: an HTTP request through the API to a response, with the database real

The choice of scope affects what bugs you catch and what the test costs.

A useful working definition: any test that exercises the real interaction between at least two components, with at least one of them not stubbed.

Why Integration Tests Matter

Unit tests with mocks can pass even when the system is broken. The mock returns what the test author expected; production returns something else. The bug only surfaces with real components.

Common integration-only bugs:

SQL queries that work in isolation but fail with real schemas
API contracts that don't match between caller and callee
Race conditions only visible with real timing
Transaction boundaries that don't behave as the unit tests assumed
Serialization issues between systems
Configuration that's correct in isolation but wrong in context

These bugs are exactly what integration tests catch and unit tests don't.

When to Use Integration Tests

Heavy use:

Code interacting with databases. The database is the application; testing without it tests fiction.
API endpoints. What a client actually calls.
Inter-service communication. Where contract breaks happen.
Stateful workflows. Multi-step operations that span components.

Light use:

Pure logic (use unit tests)
UI rendering (use higher-level tests if needed)
Third-party SaaS behavior (test the contract, not their internals)

The "Test Trophy" Shape

The traditional test pyramid (many unit, few integration, fewer e2e) is increasingly being challenged. Many modern test strategies use the "trophy" shape:

Foundation: static analysis (types, lint)
Many integration tests
Some unit tests for pure logic
Few end-to-end tests

The shift reflects modern realities: tooling makes integration tests fast enough to use heavily, and unit tests with heavy mocking often miss real bugs.

You don't need to commit to one shape. Use the test that catches the bug at the cost you're willing to pay.

Writing Integration Tests

A typical integration test:

def test_create_user_persists_and_returns_user():
    # Arrange: clean database state
    db = test_database()
    service = UserService(db=db)
    
    # Act: real call through the service to the database
    user = service.create_user(name="Sam", email="sam@example.com")
    
    # Assert: real query verifies the data
    fetched = db.users.find_by_id(user.id)
    assert fetched.name == "Sam"
    assert fetched.email == "sam@example.com"

Key differences from a unit test:

Real database (in a test environment)
Real service-to-database call
Verification by re-querying, not by trusting the return value

Speed

Integration tests are slower than unit tests. Acceptable ranges:

Single integration test: 10ms - 500ms
Full integration suite: under 5 minutes ideally

Tests slower than that get run less often, which defeats the purpose.

Common speed killers:

Spinning up new database instances per test (use one shared instance with cleanup)
Setting up too much fixture data
Not using transactions or other rollback mechanisms
Network calls to slow external systems

Profile slow tests. The 80/20 rule: a few slow tests account for most of the suite time.

Test Isolation

The hardest part of integration testing. Tests can't depend on each other's state.

Strategies:

Transactions: each test runs in a transaction that's rolled back at the end
Truncation: delete relevant tables between tests
Unique data per test: each test uses isolated IDs, namespaces, or accounts
Fresh databases: new database per test (slow but cleanest)

Transactions are fast and clean for most cases. They don't work when the code under test commits its own transactions.

Test Doubles for Externals

Even in integration tests, you don't include every external system. Lines you draw:

Real: database, cache, your own services
Real-but-isolated: message queues with test topics
Faked: in-memory equivalents for things that are expensive (S3 → in-memory storage)
Stubbed: third-party services with canned responses
Mocked rarely: only when no other option

For third-party APIs, contract tests (verifying you produce/consume the agreed format) plus stubs in integration tests typically work better than calling the real third-party service.

API Integration Tests

A common shape: tests that exercise the HTTP API end-to-end, with real database, but mocked external services.

def test_post_user_creates_resource():
    response = test_client.post("/users", json={
        "name": "Sam",
        "email": "sam@example.com"
    })
    
    assert response.status_code == 201
    assert response.json()["id"]
    
    # Verify side effect
    user_id = response.json()["id"]
    fetched = test_client.get(f"/users/{user_id}")
    assert fetched.json()["email"] == "sam@example.com"

This style catches API contract issues, HTTP-layer concerns (status codes, content types, validation), and basic correctness — without the cost of e2e tests.

Test Data

A common stumbling block. Approaches:

Per-test data: each test creates the data it needs. Slow but clear.
Shared fixtures: common data loaded once. Fast but can produce hidden dependencies between tests.
Factories/builders: programmatic data generation with sensible defaults. Fast, clear, flexible.
Snapshots/seeds: pre-built database states. Brittle as the schema evolves.

Factories are usually the sweet spot. Each test creates the specific data it needs from the factory, with the factory handling sensible defaults.

What Not to Do

Integration tests as a replacement for unit tests. They're slower and more brittle. Use unit tests where they fit; integration where they're needed.

Integration tests with full stubs. If everything is stubbed, you're not testing integration — you're testing the stubs.

Tests dependent on test order. Brittle. Tests must be independent.

Tests with environmental coupling. "This test only passes on Linux." Reveal the coupling and remove it.

Integration tests that lie. Pretending to test the real database while actually using SQLite when production is Postgres. The differences will bite.

Flakiness in Integration Tests

Integration tests tend to be flakier than unit tests because there are more moving parts.

Common causes:

Timing assumptions (async operations without proper waits)
Test data conflicts (tests sharing state they shouldn't)
Slow database operations under load
Race conditions in shared infrastructure

Flaky integration tests should be quarantined immediately and fixed. The team's tolerance for occasional failures destroys the value of the suite.

Key Takeaway

Integration tests verify components work together, catching bugs that unit tests with mocks miss. Most valuable when code interacts with databases, APIs, or other services. Aim for fast tests (under 500ms each, full suite under 5 minutes). Use real dependencies where possible; stub only external services. Keep tests isolated through transactions or per-test data. Treat flakiness as a defect. The "test trophy" with heavy integration coverage is often a better fit for modern services than the traditional pyramid.

ShiftQuality