Data Modeling for Distributed Systems

Contributor
Nov 25, 2025
5 min read

The previous post in this path covered the fundamental challenges of distributed systems — partial failures, consistency trade-offs, and the fallacies that make distributed engineering hard. This post addresses the most consequential design decision in any distributed system: how you model and partition your data.

In a monolithic application with a single database, data modeling is a well-understood discipline. Normalize your tables, define your relationships, and the database handles consistency, transactions, and joins. In a distributed system, there is no single database. Data is split across services. Joins cross network boundaries. Transactions span systems that have no shared transaction mechanism.

Every data modeling decision in a distributed system is a trade-off between consistency, performance, and coupling. Understanding these trade-offs — and making them deliberately — is what separates distributed systems that work from distributed systems that constantly surprise you.

Ownership Is the First Decision

Before you model the data, decide who owns it. Every piece of data in a distributed system must have exactly one authoritative source — one service that is the system of record for that data.

The customer's email address is owned by the user profile service. The order total is owned by the order service. The inventory count is owned by the inventory service. Other services may hold copies of this data, but the copy is derived from the source. When the data changes, the change originates at the owner.

Without clear ownership, you get conflicting writes. The billing service updates the customer's address. The profile service also updates the customer's address. Now there are two versions, neither knows about the other, and reconciliation is someone's weekend project.

Ownership is not about access control. Any service can read data it needs. Ownership is about write authority — one service is the source of truth, and changes flow outward from there.

Data Duplication Is Not a Bug

In a relational database, data duplication is a modeling error. In a distributed system, it is a design pattern.

When the order service needs to display the customer's name alongside the order, it has two options. It can call the customer service at query time to fetch the name. Or it can store a copy of the customer's name locally, updated via events when the name changes in the customer service.

The first option creates runtime coupling — the order service cannot display orders if the customer service is down. The second option creates data duplication — the customer's name exists in two places and can be temporarily inconsistent.

For most use cases, duplication with eventual consistency is the better trade-off. The order service can display orders independently of the customer service's availability. The name might be a few seconds stale during a name change, which is acceptable for nearly every application.

The rule: duplicate data that you need for queries. Keep the source of truth authoritative. Propagate changes through events. Accept that copies are eventually consistent. Design the user experience to tolerate the consistency window.

CQRS: Separate Reads From Writes

Command Query Responsibility Segregation separates the write model (commands) from the read model (queries). The write model is optimized for consistency and business rule enforcement. The read model is optimized for the specific queries your application needs.

In a distributed system, CQRS is particularly powerful because read and write patterns are often fundamentally different. Writes are infrequent, must be consistent, and enforce complex business rules. Reads are frequent, can tolerate slight staleness, and require data joined from multiple services.

The implementation: write operations go to the service that owns the data. When a write succeeds, an event is published. Read-optimized projections — denormalized views that combine data from multiple services — consume these events and update their local stores. Queries read from the projections, which are fast because they are pre-joined and pre-indexed for the specific access patterns.

The cost is complexity. You are maintaining two models — the write model and the read projections — and keeping them synchronized through events. The projections are eventually consistent. The infrastructure for event consumption and projection updates is non-trivial.

CQRS is worth the cost when read and write patterns are significantly different, when read performance is critical, or when queries need data from multiple services. It is over-engineering for simple CRUD applications where the read model and write model are identical.

Sagas: Transactions Across Services

In a monolithic database, a transaction ensures that multiple operations either all succeed or all fail. Transfer money between accounts: debit one, credit the other. If either fails, both are rolled back. The database guarantees this.

In a distributed system, there is no cross-service transaction. The order service and the payment service have separate databases. There is no mechanism that atomically commits to both.

A saga is the distributed alternative. Instead of one atomic transaction, a saga is a sequence of local transactions, each within a single service, coordinated through events. If a step fails, compensating transactions undo the previous steps.

Place an order: the order service creates the order (local transaction), then publishes "order created." The payment service processes the payment (local transaction), then publishes "payment succeeded." The inventory service reserves stock (local transaction), then publishes "stock reserved."

If the payment fails: the payment service publishes "payment failed." The order service executes a compensating transaction — canceling the order. The system returns to a consistent state without a distributed transaction.

Sagas require careful design. Every step must have a corresponding compensation. The compensation must be idempotent — executing it multiple times must produce the same result. The ordering of events must be handled — what if "payment failed" arrives before "order created"?

These are solvable problems, but they are problems that do not exist in a monolithic database. Sagas are the price of distributed data ownership.

Cross-Service Queries: The Hardest Problem

The most painful consequence of distributed data is the loss of joins. In a single database, joining orders with customers with products is a SQL query. In a distributed system, orders, customers, and products are in different services with different databases.

Several patterns address this.

API composition aggregates data at the API gateway or BFF (Backend-for-Frontend) layer. The gateway calls the order service, the customer service, and the product service, then assembles the response. This is simple but creates runtime coupling and latency that scales with the number of services called.

CQRS projections pre-join the data in a read-optimized store, as described above. Queries are fast because the data is already combined. The cost is the eventual consistency of the projections.

Data mesh patterns give each domain team responsibility for publishing their data as a product — a well-documented, well-maintained data set that other teams can consume. This is an organizational pattern as much as a technical one, but it addresses the long-term challenge of cross-domain data access.

There is no perfect solution. Each pattern trades one cost for another. The right choice depends on your consistency requirements, your latency budget, and your operational capacity.

The Takeaway

Data modeling in distributed systems is harder than data modeling in monolithic systems because every decision forces a trade-off between consistency, availability, coupling, and complexity.

Own your data clearly — one service is the source of truth for each piece of data. Duplicate strategically — copy data that you need for reads and keep it eventually consistent. Separate reads from writes when the patterns diverge. Use sagas for cross-service transactions. And choose your cross-service query pattern based on your specific constraints.

These are not best practices to follow blindly. They are trade-offs to make deliberately. The distributed system that works is the one where every trade-off was made with open eyes and documented reasoning.

Next in the "Systems Thinking" learning path: We'll cover resilience engineering — building systems that degrade gracefully under failure, recover automatically, and maintain acceptable service levels when individual components break.

ShiftQuality