Tutorial 4: Build a Test Data Factory

Contributor
May 5
3 min read

By your fifth test, you'll notice the same setup code repeated. By your twentieth, the repetition is painful. A test data factory eliminates the repetition while keeping each test's specific setup visible.

What You'll Build

Factory functions that produce test objects with sensible defaults you can override per test.

Step 1: Identify the Repetition (5 min)

Look at your existing tests. Where do you have setup code that varies in only minor ways?

# Test 1
user = User(name="Test User", email="test1@example.com", role="standard", verified=True)

# Test 2  
admin = User(name="Test Admin", email="admin@example.com", role="admin", verified=True)

# Test 3
unverified = User(name="Unverified", email="new@example.com", role="standard", verified=False)

Each test cares about one or two properties; the rest is boilerplate.

Step 2: Write the Basic Factory (15 min)

# tests/factories.py
from uuid import uuid4

def make_user(**overrides):
    defaults = {
        "name": "Test User",
        "email": f"test+{uuid4()}@example.com",
        "role": "standard",
        "verified": True,
    }
    defaults.update(overrides)
    return User(**defaults)

Note the unique email — using uuid4() prevents collisions between tests.

Step 3: Use It in Tests (10 min)

def test_admin_can_view_users():
    admin = make_user(role="admin")
    # ... test logic

def test_unverified_user_cannot_access_dashboard():
    user = make_user(verified=False)
    # ... test logic

Each test specifies only what matters to it. The defaults handle the rest.

Step 4: Add Factories for Related Objects (15 min)

If User has related Workspace, Posts, etc., factory each:

def make_workspace(**overrides):
    defaults = {
        "name": f"Test Workspace {uuid4()}",
        "tier": "standard",
    }
    defaults.update(overrides)
    return Workspace(**defaults)

def make_user_in_workspace(workspace=None, **user_overrides):
    workspace = workspace or make_workspace()
    user = make_user(**user_overrides)
    workspace.add_member(user)
    return user, workspace

Step 5: Persist When Needed (10 min)

For integration tests, factories that save to the database:

def create_user(db_session, **overrides):
    user = make_user(**overrides)
    db_session.add(user)
    db_session.commit()
    return user

Distinguish make_user (object only) from create_user (persisted). Tests use whichever they need.

Step 6: Handle Faker for Realistic Data (10 min)

For tests that need varied realistic data:

from faker import Faker

fake = Faker()

def make_user(**overrides):
    defaults = {
        "name": fake.name(),
        "email": fake.email(),
        "role": "standard",
        "verified": True,
    }
    defaults.update(overrides)
    return User(**defaults)

Faker produces plausible names, emails, etc. Useful when:

You want realistic-looking variation
You're testing display logic that needs diverse data

For most logic tests, stable test values are fine.

Step 7: Build Specialized Factories (10 min)

For common test scenarios:

def make_admin_user(**overrides):
    return make_user(role="admin", **overrides)

def make_unverified_user(**overrides):
    return make_user(verified=False, **overrides)

def make_premium_workspace(**overrides):
    return make_workspace(tier="premium", **overrides)

These wrap common patterns. Tests become even more readable.

Step 8: Avoid the Over-Factory Trap (5 min)

Don't build factories for everything. Trade-offs:

Use a factory: when the object is created in multiple tests with mostly-default values
Inline: when the test needs very specific values, factory just hides what matters
Skip: for one-off test data that won't be reused

Factories help with the common case. Don't force them.

Step 9: Test the Factories (5 min)

Quick test that the factory works:

def test_factory_produces_valid_user():
    user = make_user()
    assert user.email
    assert "@" in user.email
    assert user.role == "standard"

Catches factory regression when models change.

What You Just Did

You replaced verbose test setup with concise factory calls. Each test now reads as "specific thing I care about" rather than "ten lines of irrelevant boilerplate."

Common Failure Modes

Mega-factory. One factory function for everything. Use small, focused factories.

Hidden coupling. Default values become assumptions tests depend on without realizing. Tests fail mysteriously when defaults change.

Factories that hit the DB unnecessarily. Use object-only factories for unit tests; persist-factories for integration.

Stale factories. Model adds a required field; factory not updated. Tests fail confusingly.

Faker for everything. Random data in tests makes failures hard to reproduce. Use deterministic defaults for logic tests.

Next Tutorial

Tests written, run them automatically: Tutorial 5: Set Up CI to Run Tests.

ShiftQuality