Database Scaling Patterns That Match Your Actual Problem
- Contributor
- Nov 19, 2025
- 5 min read
Your database is the bottleneck. Queries are slow. Connection counts are climbing. The CPU on the database server is pegged. You have read about sharding, and it sounds like the answer.
It almost certainly is not. Not because sharding is wrong, but because it is the nuclear option — complex, expensive, and almost never the first thing you need. Teams that jump to sharding skip the simpler, cheaper, and more effective patterns that would have solved their problem with a fraction of the operational complexity.
The previous post in this path covered caching strategies. This post covers what to do when the database itself is the constraint — starting with the cheapest fixes and escalating only when they are exhausted.
Step Zero: Are You Sure It's the Database?
Before you scale the database, verify that the database is actually the problem. A slow API response that includes a database query is not necessarily a slow query — it might be slow serialization, slow network, or slow application code.
Measure. Profile. Trace.
Identify the specific queries that are slow. Most databases include query logging and slow query analysis tools. PostgreSQL has pg_stat_statements. MySQL has the slow query log. Use them. In most applications, 90% of the database load comes from fewer than 10% of the queries. Find those queries first.
Many "scaling problems" dissolve when you add a missing index, rewrite a poorly constructed query, or fix an N+1 problem in the application layer. These are not scaling solutions. They are correctness fixes. But they often provide 10x or 100x improvement for zero infrastructure changes.
Pattern 1: Connection Pooling
Database connections are expensive to create. Each connection consumes memory on the database server, holds open a network socket, and requires authentication overhead. An application that opens a new connection for every request and closes it afterward is burning resources on connection management instead of query execution.
Connection pooling maintains a set of open, reusable connections. When the application needs a database connection, it borrows one from the pool. When the query completes, the connection is returned to the pool, not closed. The overhead of creating and destroying connections is eliminated.
For applications that are not already using connection pooling — and this includes many early-stage applications using default ORM settings — adding a connection pool is often the single most impactful database performance improvement. PgBouncer for PostgreSQL, ProxySQL for MySQL, or the built-in connection pooling in your ORM (EF Core, Django, SQLAlchemy) are all effective.
The more connections you have, the more important pooling becomes. A server with 100 concurrent requests that each opens a database connection needs 100 connections. With a pool of 20 connections and proper queueing, the same server handles the same load with 80% fewer database connections.
Pattern 2: Read Replicas
Most web applications are read-heavy. The ratio of reads to writes is often 10:1 or higher. Read replicas exploit this by offloading read queries to copies of the database that receive writes from the primary.
The primary database handles all writes. One or more replicas receive a continuous stream of changes from the primary and handle read queries. The read load is distributed across replicas. The primary server is freed to focus on writes.
Adding a read replica is operationally straightforward with managed databases — a few clicks in AWS RDS, Google Cloud SQL, or Azure Database. The application change is also straightforward: route write queries to the primary and read queries to a replica.
The catch is replication lag. Changes written to the primary take a small amount of time — typically milliseconds to seconds — to propagate to replicas. During that window, a read from the replica returns stale data. For most read operations, this is acceptable. For reads that must reflect the most recent write (like reading a record immediately after creating it), the query should go to the primary.
Read replicas scale the read side of your database linearly. Need more read capacity? Add another replica. This pattern handles the scaling needs of the vast majority of web applications without any changes to the data model.
Pattern 3: Query Optimization and Denormalization
Before adding infrastructure, optimize the queries that hit the infrastructure you have.
Index analysis is the highest-return optimization. A query that scans a million-row table to find 10 matching records takes seconds. The same query with an appropriate index takes milliseconds. Use your database's EXPLAIN command to understand how queries execute. Look for sequential scans on large tables. Add indexes for the columns in your WHERE clauses, JOIN conditions, and ORDER BY clauses.
Denormalization trades storage for speed. A normalized schema requires joins to assemble related data. Joins are expensive when the tables are large. Storing precomputed data — a customer's order count alongside the customer record, a product's average rating alongside the product record — eliminates the join at read time.
Denormalization adds write complexity (you must update the precomputed value when the source data changes) and risks inconsistency (the precomputed value can drift from the source). But for read-heavy, performance-sensitive paths, the trade-off is often worthwhile.
Materialized views provide denormalization without modifying the base tables. A materialized view is a precomputed query result stored as a table. The database refreshes it on a schedule or on demand. Reporting queries that aggregate millions of rows can read from the materialized view in milliseconds instead of computing the aggregation in real time.
Pattern 4: Partitioning
When a single table grows too large for efficient querying — billions of rows, terabytes of data — partitioning splits it into smaller, manageable pieces.
Time-based partitioning splits a table by date range. Queries for "last 30 days of orders" only scan the relevant partitions instead of the entire table. Old partitions can be archived or dropped efficiently.
Key-based partitioning splits a table by a key attribute. Orders for customer IDs 1-1000 in one partition, 1001-2000 in another. Queries that include the partition key are routed to the relevant partition.
Partitioning is transparent to the application — queries work the same way, but the database scans less data. It is the right pattern when individual tables exceed the size that indexes can efficiently handle, which is typically in the hundreds of millions of rows.
Pattern 5: Sharding (When You've Earned It)
Sharding distributes data across multiple independent database servers. Each shard holds a subset of the data, and the application (or a proxy) routes queries to the correct shard.
Sharding is the most powerful scaling pattern and the most complex. It introduces routing logic, cross-shard query challenges, rebalancing complexity, and operational overhead that is a full-time job for a database team.
Before you shard: have you optimized your queries? Added indexes? Implemented connection pooling? Added read replicas? Partitioned large tables? Implemented caching? If you have done all of these and the database is still the bottleneck, sharding may be necessary. If you have not, start with the cheaper solutions.
The Takeaway
Database scaling is a ladder, not a leap. Start at the bottom — query optimization, connection pooling, indexes. Move to read replicas and caching. Proceed to partitioning. Arrive at sharding only when the simpler patterns are exhausted.
Each rung of the ladder is cheaper and less risky than the one above it. Most applications never need to climb past read replicas. The ones that do should arrive at sharding having already extracted every ounce of performance from the simpler patterns.
Match the solution to the problem. Measure before you scale. And never shard a database you haven't indexed.
Next in the "Production Web Architecture" learning path: We'll cover observability for web applications — how to instrument your application so you can see what it's actually doing in production, diagnose issues in minutes instead of hours, and build the feedback loop between production behavior and development decisions.


