Property-Based Testing: A Beginner's Tour

Contributor
Mar 22
4 min read

Most tests are example-based: you write specific inputs and verify specific outputs. Property-based tests are different — you describe a property the code should have, and the testing framework generates many inputs to try to violate it. When it can't break your property after thousands of attempts, you have stronger evidence the property holds.

This guide is the basic idea and where it pays off.

The Core Concept

Example-based test:

def test_reverse_twice_returns_original():
    assert reverse(reverse([1, 2, 3])) == [1, 2, 3]
    assert reverse(reverse([4, 5])) == [4, 5]
    assert reverse(reverse([])) == []

Three specific inputs. You picked them; they pass.

Property-based test:

@given(lists(integers()))
def test_reverse_twice_returns_original(xs):
    assert reverse(reverse(xs)) == xs

The framework generates many lists of integers — empty, small, large, with duplicates, with negative numbers, with extreme values — and verifies the property for each.

If any input violates the property, the framework reports it (often after "shrinking" to find the simplest counter-example).

What Properties Look Like

Common patterns:

Round-trip: parse(serialize(x)) == x
Idempotence: f(f(x)) == f(x)
Commutativity: f(a, b) == f(b, a)
Invariants: "the total is always positive," "the list is always sorted"
Comparative: "the new implementation gives the same result as the old one"

The property is the invariant that should hold for all valid inputs.

When Property-Based Testing Shines

Particularly useful for:

Parsers and serializers. Round-trip properties catch a class of bug example tests miss.
Data transformations. "Filter then map equals map then filter" kinds of properties.
Algorithms. Sort, merge, search — invariants are well-defined.
Math-heavy code. Numerical properties have natural invariants.
State machines. Properties like "applying any sequence of operations preserves invariants."

Less useful for:

UI code (hard to define properties)
Side-effect-heavy code (properties on what?)
Wire-format details (specifics matter more than properties)
Business rules without clear invariants

The Shrinking Property

A core feature of property-based testing tools: when they find a counter-example, they shrink it.

If a test fails with input [3, 7, -1, 14, 0, 9], the framework tries to find a simpler failing input — maybe [3, -1] is enough, or even [-1].

Shrinking dramatically improves the debugging experience. The framework hands you a minimal failing case, not a complex one you have to reduce yourself.

Tools

Popular libraries:

Python: Hypothesis
Haskell: QuickCheck (the original)
JavaScript: fast-check
Java: jqwik
Rust: proptest
Go: rapid

Hypothesis is particularly mature and well-documented; a good starting point for those learning the technique.

Writing Your First Property Test

Start small. Pick a function with a simple invariant.

from hypothesis import given, strategies as st

@given(st.integers(), st.integers())
def test_addition_commutative(a, b):
    assert add(a, b) == add(b, a)

Run it. If it passes, the property holds for many cases. If it fails, you get a counter-example.

Build from there. The technique requires practice to get good at framing properties.

Combining With Example-Based Tests

Property-based and example-based tests complement each other.

Example-based: explicit verification of specific cases. Readable. Documents expected behavior.
Property-based: broad verification of invariants. Catches edge cases.

A good test suite often includes both. The example tests show what you intended; the property tests verify the intent holds in general.

Generators and Strategies

A key concept: how the framework generates inputs.

Most frameworks provide:

Primitive generators (integers, floats, strings, bytes)
Compound generators (lists, dicts, tuples)
Filtered generators (positive integers, valid emails)
Custom generators (your domain objects)

The quality of your tests depends partly on the quality of your generators. A generator that only produces small numbers won't find issues with large ones.

Stateful Property Testing

Beyond functional properties, frameworks like Hypothesis support stateful testing: generate sequences of operations and verify invariants hold throughout.

class ShoppingCartStateMachine(RuleBasedStateMachine):
    @rule(item=st.text())
    def add_item(self, item):
        self.cart.add(item)
    
    @rule()
    def remove_item(self):
        if self.cart:
            self.cart.remove_last()
    
    @invariant()
    def cart_count_non_negative(self):
        assert self.cart.count() >= 0

The framework generates random sequences of operations and verifies invariants. Particularly useful for testing classes with complex state.

What Goes Wrong

Trivial properties. f(x) == f(x) always passes. Make sure the property is meaningful.

Tautological properties. Properties expressed in terms of the implementation. They pass because they restate the code.

Slow tests. Generating thousands of inputs takes time. Limit the input space or reduce the number of examples per test.

Misleading shrinks. Sometimes shrinking produces a misleading minimal case that's not the simplest cause. Investigate before trusting.

Flaky tests. Property-based tests can fail intermittently when the input space is large. Use deterministic seeds for CI.

Determinism in CI

For CI, you want deterministic test runs. Two strategies:

Fixed seed: the framework uses the same seed each run, generating the same inputs
Replay failing cases: when a test fails, the failing input is recorded; subsequent runs always include it

Most frameworks support both. The combination gives you determinism plus accumulation of "interesting" inputs over time.

When to Reach for Property Testing

Triggers:

"We need to test a parser/serializer; round-trip property is natural"
"We're rewriting an algorithm; want to verify equivalence with the old version"
"We have a class with complex state and lots of operations"
"Example tests pass but bugs keep appearing in production"

In each case, properties capture what the code should do more broadly than examples.

Limitations

Property-based testing isn't a universal answer:

Not all behavior has clean properties
Properties can be wrong (just like tests can be wrong)
Generators take effort to write well
Debugging property failures takes more thought than debugging example failures

Use where it fits. Don't force it.

Combining With Other Testing

For a typical codebase, property tests might cover:

5-15% of test count
20-40% of the test value for the areas they apply to
Specific risky modules: parsers, transformers, complex state machines

The rest is example-based. The combination is more powerful than either alone.

Key Takeaway

Property-based testing describes invariants and lets the framework generate inputs to try to violate them. Particularly useful for parsers, transformations, algorithms, and stateful systems. Common properties: round-trip, idempotence, commutativity, invariants. Shrinking gives you minimal failing cases for debugging. Use alongside example-based tests, not as a replacement. Start small with a clear invariant; expand as the technique becomes familiar.

ShiftQuality