Testing the Hard Parts: Async, External Dependencies, and State

ShiftQuality Contributor
Sep 28, 2025
5 min read

The previous post in this path covered testing strategies that catch real bugs — risk-based testing, behavior-focused assertions, edge cases, and integration boundaries. This post goes deeper into the specific technical challenges that make teams say "we can't test that."

You can. It just requires different techniques than testing a function that takes input and returns output.

The three hardest testing problems in most codebases are async operations, external dependencies, and stateful workflows. Each one resists straightforward unit testing. Each one has practical solutions that don't require elaborate test infrastructure.

Testing Async Operations

Async code introduces timing — and timing introduces a category of bugs that synchronous tests cannot reproduce. A race condition, a missed await, a callback that fires after the test completes. These are real bugs that cause real production failures, and they require intentional testing strategies.

The first rule: never use arbitrary delays in tests. sleep(2) is not a testing strategy. It is a prayer that two seconds is long enough, and it makes your test suite slow. Instead, wait for conditions — poll for the expected state, or use testing utilities that resolve when async operations complete.

For async functions that return promises or tasks, the fix is straightforward: await the result in the test and assert on the resolved value. Most modern test frameworks handle this natively. The test itself is async, it awaits the operation, and it asserts on the outcome.

async def test_order_processing_sends_confirmation():
    service = OrderService(mock_repo, mock_emailer)
    result = await service.process_order(test_order)

    assert result.success is True
    assert mock_emailer.last_sent_to == "customer@example.com"

For event-driven or callback-based code, the pattern is similar but requires capturing the callback. Provide a test double that records invocations, trigger the operation, and assert that the callback was called with the expected arguments.

The hard case is concurrent operations — multiple async tasks that interact with shared state. These require tests that explicitly exercise the concurrent scenario: start both operations, let them interleave, and verify that the shared state is consistent. This is where race conditions hide, and finding them requires tests that are designed to provoke the race.

Testing External Dependencies

Your code calls a payment API, a third-party geocoding service, a database, and an email provider. You cannot — and should not — call these in unit tests. External calls are slow, flaky, expensive, and non-deterministic. A test that fails because Stripe's sandbox is down is not testing your code. It is testing Stripe's uptime.

The solution is test doubles, and choosing the right type of double matters.

Stubs return canned responses. When the code calls the payment API, the stub returns a predetermined success or failure response. Stubs are simple to build and appropriate when you just need the dependency to return a value so the rest of the code can proceed.

Mocks verify interactions. Not only does the mock return a canned response, it records that it was called and allows you to assert on how it was called — with what arguments, how many times, in what order. Mocks are appropriate when the interaction itself is the behavior you are testing.

Fakes are lightweight implementations that behave like the real thing but without the infrastructure. An in-memory database, a local file system wrapper, a simulated message queue. Fakes are more work to build but more realistic than stubs, and they catch a broader category of bugs because the fake exercises more of the interaction contract.

The common mistake is over-mocking: replacing so many dependencies with mocks that the test verifies nothing except the wiring between mocks. If your test has more mock setup than assertions, it is testing the test, not the code.

The guideline: mock at the boundary, not throughout. Replace the HTTP client, not the service that calls it. Replace the database connection, not the repository. The business logic between the entry point and the boundary is the code under test. The boundary itself is the seam where test doubles plug in.

Testing Stateful Workflows

A shopping cart starts empty, items are added, quantities are updated, a discount is applied, and the order is placed. Each step depends on the state produced by the previous step. Testing the final step — placing the order — requires setting up all the preceding state.

Stateful workflows are hard to test because the setup is expensive and the number of possible state combinations is large. A cart with three items, one of which is out of stock, with a percentage discount that has a minimum threshold, in a currency that requires specific rounding — that is a specific state that exercises a specific code path, and constructing it manually for every test is tedious.

Builder patterns reduce setup cost. Instead of constructing state manually in every test, create a builder that produces common starting states:

cart = (CartBuilder()
    .with_item("SKU-001", quantity=2, price=29.99)
    .with_item("SKU-002", quantity=1, price=49.99)
    .with_discount(percent=15, minimum=50.00)
    .build())

The builder encapsulates the construction logic and makes tests readable — you see what state was created, not how.

State machine testing treats the workflow as a state machine with defined transitions. Each transition has a precondition (what state must exist), an action (what happens), and a postcondition (what state results). Tests verify each transition independently:

Given an empty cart, adding an item produces a cart with one item. Given a cart with items, applying a discount updates the total. Given a cart with a discount and valid payment, placing the order succeeds.

Each test is small, focused, and exercises one transition. The combinatorial explosion is managed because you test transitions, not complete paths.

Contract Testing at Integration Boundaries

When your service depends on another team's API, the biggest testing risk is that the API changes without your tests knowing. Your mocks reflect the old contract. Your tests pass. Production fails.

Contract testing solves this by making the API contract explicit and testable. The API provider publishes a contract — the expected request format and the guaranteed response format. Your tests verify that your code produces valid requests and correctly handles the contract's responses. The provider's tests verify that their API satisfies the contract.

When the contract changes, both sides are notified. Your tests fail against the new contract before the change reaches production. The provider's tests fail if they break their contract before their deployment reaches production.

This sounds like extra work. It prevents the specific category of production incident that is most common in microservice architectures: "their API changed and our service broke." The investment in contract testing pays for itself in the first incident it prevents.

The Takeaway

The hard parts of testing — async, external dependencies, and state — are not untestable. They require different techniques: awaiting instead of sleeping, test doubles at boundaries instead of throughout, builders for state setup, and contracts at integration points.

The teams that test effectively are not the ones with the easiest code to test. They are the ones who invested in the patterns that make hard code testable — and those patterns, not coincidentally, also produce better-designed code.

Next in the "Testing That Matters" learning path: We'll cover testing in CI/CD — how to structure your test suite for fast feedback in the pipeline, manage flaky tests, and make your test results actionable instead of ignored.