Caching Strategies That Don't Bite You

Contributor
May 15
5 min read

Caching is the first tool most teams reach for when performance degrades. The database query takes 200ms? Cache it. The API call is slow? Cache it. The page render is heavy? Cache it. The response time drops. Everyone celebrates.

Six weeks later, users report seeing stale data. A price update takes twenty minutes to appear. A customer's profile shows their old address. The support team asks engineering "why does the app show wrong information?" and the answer is the cache — the same cache that made everything fast is now making everything wrong.

Caching trades correctness for speed. That trade-off is often worth it. But only if you understand what you are trading, how stale your data can be, and how you will handle the inevitable moment when the cache lies.

Why Caching Is Hard

The fundamental tension: caching works by storing a copy of data closer to where it is consumed. The moment you have two copies — the source and the cache — they can disagree. And they will, because the source changes and the cache does not know about it until you tell it.

This is the cache invalidation problem, and it is genuinely hard because the correct invalidation strategy depends on your specific data, your specific access patterns, and your specific tolerance for staleness. There is no universal answer, which is why every caching system eventually develops quirks that only one person on the team understands.

The secondary problem is cache stampedes. When a popular cached item expires, every concurrent request for that item hits the database simultaneously instead of being served from the cache. If the item is popular enough, this can overwhelm the database — the exact problem the cache was supposed to prevent.

The third problem is debugging. When data is wrong, the cache is always a suspect but never the only one. Is the stale data coming from the application cache, the CDN cache, the browser cache, the database connection pool, or the API gateway cache? In a production system with multiple caching layers, tracing a staleness issue to its source is its own discipline.

Cache-Aside: The Default Pattern

Cache-aside — also called lazy loading — is the most common pattern and the one you should reach for first.

The application checks the cache for the requested data. If it is there (cache hit), return it. If not (cache miss), fetch from the source, store in the cache, and return. The cache is populated on demand, and the application controls both reads and writes.

The advantage is simplicity. The application owns the caching logic. There is no background process to manage, no synchronization to worry about, and the cache only contains data that has actually been requested.

The disadvantage is that after a write to the source, the cache still holds the old value until it either expires or is explicitly invalidated. This creates a staleness window between the write and the next cache refresh.

For many use cases — product catalogs, user profiles, reference data — a staleness window of seconds to minutes is acceptable. For others — inventory counts, account balances, real-time pricing — it is not.

Write-Through and Write-Behind

Write-through caching updates the cache at the same time as the source. When the application writes a new price, the cache is updated immediately alongside the database. No staleness window. The cache is always current.

The cost is write latency. Every write now touches two systems instead of one. If the cache is remote (like Redis), that adds network round-trip time to every write operation. For write-heavy workloads, this can be significant.

Write-behind caching flips the order: write to the cache first, then asynchronously write to the source. This is fast — the application sees cache-speed writes — but it introduces the risk of data loss. If the application crashes between the cache write and the source write, the data exists only in the cache. If the cache crashes, the data is gone.

Write-behind is appropriate when speed matters more than durability — real-time analytics counters, session data, temporary state. It is not appropriate for financial transactions, inventory, or anything where losing a write is unacceptable.

TTL: The Simplest Invalidation

Time-to-live is the bluntest invalidation tool: every cached item expires after a fixed duration. After the TTL elapses, the next request triggers a fresh fetch from the source.

TTL is simple and predictable. It guarantees a maximum staleness window equal to the TTL. A 5-minute TTL means data is at most 5 minutes out of date.

The art is in choosing the right TTL for each data type. Data that changes rarely — country codes, product categories, feature flags — can have TTLs of hours or days. Data that changes frequently — inventory counts, prices, user status — needs TTLs of seconds or minutes. Data that must always be current should not be cached with TTL at all.

The common mistake is using a single TTL for everything. A 5-minute TTL that is perfect for product descriptions is an eternity for stock levels and unnecessary for static content that changes quarterly. Tune TTLs per data type, not per cache.

Event-Driven Invalidation

The most precise invalidation strategy: when the source data changes, an event fires, and the cache entry is invalidated or updated. No TTL guessing. No staleness window. The cache is updated within the latency of the event delivery system.

In practice, this means publishing an event (via a message queue, webhook, or change data capture stream) whenever a write occurs, and having a consumer that updates or evicts the corresponding cache entry.

The advantage is precision. Only the changed data is invalidated. Everything else stays cached and fresh.

The cost is infrastructure and complexity. You need an event system. You need a consumer that maps events to cache keys. You need to handle failures — what happens when the event is lost or delayed? You need to handle ordering — what happens when two updates to the same record arrive out of order?

Event-driven invalidation is worth the investment for high-traffic, low-tolerance-for-staleness systems. It is overkill for a blog's tag cloud.

Multi-Layer Caching

Production systems rarely have a single cache. A typical web application might have browser caching, CDN caching, application-level caching (in-memory or Redis), and database query caching. Each layer serves a different purpose and has different invalidation characteristics.

The danger of multiple layers is compounding staleness. The database query cache has a 1-minute TTL. The application cache has a 5-minute TTL. The CDN has a 10-minute TTL. The browser has a 30-minute TTL. In the worst case, a user sees data that is 46 minutes old — the sum of all layers.

Managing multi-layer caching requires understanding which layers you control and which you don't. You control your application cache. You mostly control your CDN configuration. You influence browser caching through HTTP headers but cannot force a browser to discard its cache. Design your caching strategy with the full stack in mind, not just the layer you are currently working on.

The Takeaway

Caching is a trade-off between speed and correctness. Every caching decision should begin with two questions: how stale can this data be? And what happens if the cache serves wrong data?

For tolerant data, TTL-based cache-aside is simple and effective. For critical data, event-driven invalidation or write-through caching reduces staleness to near zero. For the space in between, the answer is usually a combination of strategies tuned per data type.

The worst caching strategy is the one nobody understands. Document your caching decisions. Make the staleness guarantees explicit. Monitor cache hit rates and invalidation patterns. A cache that is well-understood and well-monitored is a performance asset. A cache that is poorly understood is a correctness liability.

Next in the "Production Web Architecture" learning path: We'll cover database scaling patterns — read replicas, sharding, connection pooling, and how to know which one you actually need before you reach for the most complex option.

ShiftQuality