Caching stores the result of an expensive operation — a database query, an API call, a computed value — in fast storage so subsequent requests can be served without repeating the work. It is one of the highest-leverage performance techniques in distributed systems: a well-placed cache can reduce database load by orders of magnitude and cut response times from hundreds of milliseconds to single digits. At its core, caching exploits a simple empirical observation known as the 80/20 rule: 80% of traffic touches only 20% of the data. If you can keep that hot 20% in memory, you absorb the bulk of the load without touching the slower persistence tier at all.
But caching is not simply "put things in Redis." Every design decision — where in the stack you cache, which write strategy you use, what eviction policy governs memory pressure, how you handle TTLs, and how you defend against stampedes and avalanches — compounds into either a highly available, low-latency system or a subtle consistency minefield that breaks in production under load. This guide covers all of it: from the five canonical read/write strategies to eviction algorithm internals, distributed cache consistency, hot-key pathology, CDN edge caching, and the full taxonomy of failure modes with their mitigations.
- Cache-aside (lazy loading) is the safest default — only cache what's actually requested, and degrade gracefully on cache failure.
- Write-through keeps cache and DB in sync but wastes space on data that's never read; write-behind maximizes write throughput but risks data loss on crash.
- Write-around bypasses the cache on write — useful when written data is unlikely to be re-read soon (large batch imports, archival writes).
- LRU is the general-purpose eviction policy; use LFU for stable hot-key workloads; ARC adapts automatically; add TTL jitter to prevent avalanche.
- Cache stampede — when a popular key expires, many threads hit the DB simultaneously; fix with mutex locking or probabilistic early refresh.
- Redis vs Memcached — Redis wins for rich data structures, persistence, and pub/sub; Memcached wins for pure simple string throughput at scale.
- Bloom filter prevents cache penetration for non-existent keys without a DB lookup.
- Hot keys in a distributed cache saturate a single shard — replicate to multiple slots or add an in-process L1 cache in front of Redis.
Cache frequently-read, rarely-changing data close to the consumer. Choose a write strategy (cache-aside, write-through, write-behind, write-around, read-through) based on your consistency and write-amplification budget. Use TTL and an eviction policy (LRU is the safe default, ARC adapts) to bound memory. Design for cache misses — a cold start or stampede should degrade gracefully, not cascade into a database meltdown. Defend against penetration with Bloom filters, against hot keys with local replication, and against distributed inconsistency with careful invalidation or short TTLs.
Where to Cache: The Layered Stack
Caches can live at multiple layers of the stack, each with different tradeoffs in latency, visibility, and invalidation complexity:
- Client-side (browser) — HTTP caching headers (
Cache-Control,ETag,Last-Modified) let browsers reuse assets without any server round-trip. Free performance for static content; the catch is that you cannot proactively invalidate a browser cache without changing the resource URL. - CDN (edge cache) — Geographically distributed caches (Cloudflare, Fastly, AWS CloudFront) serve static and semi-static content from points of presence close to users. Reduces latency from ~200 ms to single-digit milliseconds and drastically cuts origin server load. Configurable TTLs and cache purge APIs allow targeted invalidation.
- Reverse proxy / API gateway cache — Nginx or Varnish sitting in front of the application tier can cache full HTTP responses. Useful when many users make identical requests (public news pages, product detail pages); bypassed automatically when the response varies by user.
- Application-level in-process (L1) — A hash map or a library like Caffeine (JVM) or django-cacheops (Python) lives inside the app process. Zero network overhead, nanosecond access times. Downsides: not shared across instances, evicted on restart, invisible to other services. Best used as an ultra-fast layer in front of Redis for the hottest keys.
- Distributed cache (L2) — A shared external store (Redis, Memcached) accessible by all application instances. The standard approach for session data, computed results, hot database rows, and any data that must be consistent across a fleet of app servers. Adds a network hop (~0.1–1 ms) but survives process restarts and scales horizontally.
- Database query cache — Some databases (MySQL query cache, PostgreSQL's shared buffer pool) cache result sets or buffer pages internally. This is generally opaque and not something you design; it is a reason not to panic at the first slow query on a warm instance.
Layer your caches from fast to slow: L1 in-process → L2 distributed → database. A cache miss at L1 hits L2 before touching the database. This keeps database connections free for writes and genuinely uncacheable reads. The key insight is that each layer absorbs a different fraction of traffic, so even a modest L1 hit rate dramatically reduces L2 and DB pressure.
The Five Cache Write Strategies
There is no single correct strategy. The right choice depends on your consistency tolerance, write frequency, and what happens if the cache and the database diverge. Most real systems combine multiple strategies across different data types.
1. Cache-Aside (Lazy Loading)
The application owns the cache interaction entirely. On a read: check the cache first; if it misses, query the database, write the result to the cache, then return it. On a write: update the database, then either delete or update the cache entry.
def get_product(product_id):
key = f"product:{product_id}"
data = cache.get(key) # 1. check cache
if data:
return data # cache hit — done
data = db.query("SELECT * FROM products WHERE id=?", product_id)
cache.setex(key, 300, data) # 2. populate on miss
return data
def update_product(product_id, payload):
db.execute("UPDATE products SET ... WHERE id=?", product_id)
cache.delete(f"product:{product_id}") # 3. invalidate on write
- Pro: Only requested data is cached — no wasted memory on cold data. App degrades to DB-only on cache failure.
- Con: First request after expiry is always slow (cache-cold latency spike). If you invalidate the cache entry on write but the write itself fails after invalidation, the cache is needlessly cold.
- Best for: Read-heavy workloads with infrequent updates. The default choice for most product, user, and session caches.
2. Read-Through
Similar to cache-aside but the cache library itself sits between the application and the database. The app always talks to the cache; on a miss, the cache goes to the database, populates itself, and returns the value. The application code does not need to implement the miss-and-populate logic explicitly.
- Pro: Simpler application code — the caching concern is fully encapsulated. Works well with ORM-level caching libraries.
- Con: Initial cache-miss latency is still present. Harder to customize the population logic (e.g., fallbacks, transformations).
- Best for: ORM or framework-managed caches where you want to avoid boilerplate miss-and-populate code in every service.
3. Write-Through
Every write goes to both the cache and the database synchronously before acknowledging success to the caller. The cache is always consistent with the database immediately after any write.
- Pro: Strong consistency — reads always hit a warm, accurate cache. No stale data window.
- Con: Every write pays the cache write penalty (extra network hop). Data written but never read wastes cache memory. Write latency is the sum of both operations.
- Best for: Systems where read consistency is critical and write frequency is low-to-moderate (user profile updates, configuration data).
4. Write-Behind (Write-Back)
Writes land in the cache immediately; the database write is deferred and batched asynchronously in the background. The cache serves as the primary write target, and the database is eventually brought in sync.
- Pro: Very fast writes with minimal write latency for the caller. Reduces database write pressure — multiple in-flight updates to the same key can be coalesced into one DB write.
- Con: If the cache node fails before flushing, data is lost. The database is transiently stale, which is problematic if other systems read the DB directly. Complex failure recovery logic required.
- Best for: Write-heavy workloads where data loss of recent writes is acceptable (analytics counters, gaming scores). Never use for financial transactions.
5. Write-Around
Writes go directly to the database, bypassing the cache entirely. The cache is only populated on subsequent reads (lazy loading). This prevents the cache from being filled with data that is unlikely to be re-read soon after being written.
- Pro: Prevents cache pollution from large, infrequent writes (batch imports, archival operations). Cache remains warm with genuinely hot data.
- Con: First read after a write will always miss the cache and hit the database — potential high miss rate for recently-written data that is immediately read.
- Best for: Data written in bulk but read infrequently (log ingestion, data warehouse ETL loads, media uploads).
| Strategy | Write Path | Consistency | Write Speed | Best Fit |
|---|---|---|---|---|
| Cache-Aside | DB, then invalidate cache | Eventual (TTL-bounded) | Fast (no cache write) | General read-heavy |
| Read-Through | DB, then cache auto-populates on miss | Eventual | Fast | ORM-managed caches |
| Write-Through | Cache + DB (sync) | Strong | Slower (dual write) | Low-write, read-critical |
| Write-Behind | Cache immediately, DB async | Eventual (risky) | Fastest | High write throughput |
| Write-Around | DB only, bypasses cache | Eventual (cold writes) | Fast | Batch/archival writes |
Eviction Policies: Algorithms in Depth
When the cache is full, something must be evicted to make room. The policy you choose determines which data survives — and getting it wrong means evicting hot entries while keeping cold ones, which destroys your hit rate. Each algorithm encodes a different assumption about your access pattern.
LRU — Least Recently Used
Evicts the item that has not been accessed for the longest time. Implemented as a doubly-linked list where accessed items are moved to the front; the item at the back is the eviction candidate. O(1) access and update with a hash map backing the list.
LRU works well because most workloads exhibit temporal locality — recently accessed data tends to be accessed again soon. However, LRU has a famous failure mode: a single sequential scan over a large dataset (a full-table backup, a batch report) evicts the entire working set and leaves the cache cold for the hot OLTP queries that follow. This is sometimes called a cache pollution event.
LFU — Least Frequently Used
Evicts the item with the lowest access count. Naturally keeps viral content and stable hot keys alive regardless of access recency. The downside: newly inserted items start with a count of 1 and are vulnerable to immediate eviction even if they are about to become hot. Modern LFU implementations (TinyLFU, used in Redis 4.0+ as allkeys-lfu) combine a frequency sketch with a small recency window to handle the "new popular item" problem.
ARC — Adaptive Replacement Cache
ARC automatically adapts between LRU and LFU by maintaining two lists: one for recently-used items (T1) and one for frequently-used items (T2), plus ghost lists (B1, B2) that remember recently evicted items' identities without their data. When a ghost-list hit occurs, ARC shifts the balance toward that list's strategy. The result is a cache that learns your access pattern in real time and self-tunes without configuration. ARC is patent-encumbered (IBM), so some systems implement approximations like LIRS or CLOCK-Pro instead.
FIFO — First In, First Out
The simplest policy: evict the oldest-inserted item. Requires no access tracking, making it fast and memory-efficient to implement. Almost never optimal for a cache because it has no connection to access frequency or recency; an item inserted first might be the most-accessed one in the system. FIFO's legitimate use cases are simple message buffers and rate-limiter windows, not general-purpose caches.
TTL-Based Expiration
Items are evicted when they pass their expiry timestamp, independent of the eviction policy. TTL and eviction work in concert: TTL controls data freshness; eviction controls memory pressure. Redis combines both — you set a TTL per key and choose a policy (LRU, LFU, etc.) that governs what happens when memory is full and no TTLs have fired yet.
| Policy | Evicts | Strengths | Weaknesses |
|---|---|---|---|
| LRU | Least recently accessed | Great temporal locality; simple O(1) | Sequential scan pollution; ignores frequency |
| LFU | Fewest total accesses | Keeps stable hot keys alive long-term | New hot items start cold; stale counts linger |
| ARC | Adapts LRU+LFU dynamically | Self-tuning; best of both worlds | More memory overhead; patent-encumbered |
| FIFO | Oldest inserted | Zero overhead; deterministic | No relationship to access pattern |
| TTL | Past expiry timestamp | Guarantees freshness; predictable staleness window | Not memory-pressure-driven; stale data until expiry fires |
TTL Design: Freshness vs. Stability
Setting a TTL is a tradeoff between freshness (how quickly a cache update reflects a database change) and stability (how often you pay the miss penalty). Getting it right requires understanding both the data's change frequency and the business cost of serving stale data.
- Product price — a stale price is a legal and financial risk; use a short TTL (30–60 seconds) and eager invalidation on price change events.
- User profile — acceptable staleness of a few minutes; a 5-minute TTL with event-driven invalidation on profile update is a reasonable tradeoff.
- Session data — typically a sliding TTL, reset on every access, with an absolute maximum (e.g., reset to 30 minutes each request, hard expiry at 7 days).
- Static reference data — country lists, currency codes, configuration — can carry a TTL of hours or even days since they change very rarely.
- Rate limit counters — fixed TTL equal to the rate-limit window (e.g., 1 minute for 100 req/min); no eviction policy needed because expiry is the only lifecycle event.
Never set the same TTL for a large batch of keys — if you load 10,000 product records into Redis with a uniform TTL of 300 seconds, they all expire simultaneously 5 minutes later, generating a thundering-herd avalanche on the database. Add random jitter: ttl = base_ttl + random.randint(0, base_ttl // 5). This spreads expiries over time and eliminates the synchronized miss burst.
The Three Fatal Failure Modes
Cache Stampede (Thundering Herd)
When a single popular key expires under high traffic, many concurrent requests simultaneously discover a cache miss and all race to rebuild it from the database. If rebuilding takes 200 ms and you have 1,000 req/s hitting that key, you generate 200 concurrent database queries in the first second alone — enough to crash a moderately sized database.
Mitigations:
- Mutex locking — the first miss acquires a distributed lock (Redis
SET NX) and rebuilds the entry; all other concurrent misses wait or serve a slightly stale value. Eliminates the thundering herd at the cost of increased miss latency for the waiting requests. - Probabilistic early expiration (PER) — before the TTL fires, re-compute and refresh the cache entry with some probability that increases as expiry approaches. The formula: refresh if
-beta * log(rand()) > time_to_expiry. This spreads the refresh work across time, so the key never actually expires under heavy traffic. - Background refresh — serve the stale value immediately while a background goroutine/thread re-populates the cache asynchronously. The user sees slightly stale data for one request; subsequent requests see fresh data. Requires accepting brief inconsistency.
- TTL jitter — if many users cache the same key at the same time (e.g., after a deployment flush), jitter prevents synchronized expiry.
import time, math, random
def fetch_with_per(key, beta=1.0):
# probabilistic early recompute — avoids stampede
entry = cache.get_with_ttl(key) # returns (value, remaining_ttl)
if entry is None:
return recompute_and_store(key) # cold miss
value, ttl = entry
delta = time.monotonic() - time.monotonic() # compute time proxy
if -beta * math.log(random.random()) > ttl:
return recompute_and_store(key) # early recompute
return value # serve cached value
Cache Penetration
Requests for keys that do not exist in either the cache or the database pass through to the database on every call. Each request misses the cache (nothing to cache since the result is empty) and triggers a full database query. A common cause is malicious clients querying invalid IDs in a loop — a DoS vector that bypasses the cache entirely.
Mitigations:
- Cache negative results — store a sentinel value (e.g.,
"__null__"or an empty object) with a short TTL (30–60 seconds) when a key is not found. Subsequent requests for the same non-existent key hit the cache and return immediately. - Bloom filter — maintain a probabilistic data structure in front of the cache that tracks which keys definitely exist. A Bloom filter never produces false negatives (if it says "not present," the key definitely isn't in the DB), so any key the filter rejects can be returned immediately without a cache or DB lookup. False positives (filter says "present" but key is missing) still hit the DB, so size your filter for a low false-positive rate (~1%).
import redis, random
from pybloom_live import BloomFilter
r = redis.Redis(host='localhost', port=6379)
bloom = BloomFilter(capacity=1_000_000, error_rate=0.01)
def get_user(user_id):
if user_id not in bloom:
return None # bloom says definitely missing
key = f"user:{user_id}"
cached = r.get(key)
if cached == "__null__":
return None # cached negative result
if cached:
return cached
user = db.query("SELECT * FROM users WHERE id=?", user_id)
if user is None:
r.setex(key, 30, "__null__") # cache negative result
return None
ttl = 300 + random.randint(0, 60) # jitter to prevent avalanche
r.setex(key, ttl, user)
bloom.add(user_id)
return user
Cache Avalanche
A large batch of keys expires simultaneously, sending a flood of cache misses to the database at exactly the same moment. Common triggers: a rolling deployment that flushes the cache, a bulk data import that populates thousands of keys with the same TTL, or a caching layer restart after a maintenance window.
Mitigations:
- TTL jitter — randomize expiry times on write so expirations are spread across a window rather than synchronized.
- Cache pre-warming — before a deployment that will flush the cache, run a warm-up script that populates the highest-traffic keys into the new cache before traffic is switched over.
- Circuit breaker on the database connection pool — if the database is suddenly overwhelmed by miss traffic, a circuit breaker prevents the application from opening hundreds of new connections and amplifying the problem. Fail fast and return a degraded response rather than queue indefinitely.
- Multi-tier caching — if L2 (Redis) avalanches, an L1 in-process cache absorbs a significant fraction of traffic for a few seconds, giving Redis time to rebuild.
Hot Keys: The Silent Bottleneck
In a distributed cache cluster like Redis Cluster, each key lives on exactly one shard. A hot key — a single product page, a trending tweet, a viral news article — can receive millions of requests per second, all routed to the same shard. That shard becomes a bottleneck regardless of how many other shards are idle. This is a distinct problem from cache stampede (which is about expiry) — hot keys are a throughput problem that exists even when the key is populated and never expires.
Mitigations:
- Local replication — store the hot key under multiple names with a suffix (
product:42:0,product:42:1, ...,product:42:N) spread across different shards. Read requests pick a random replica. This fans out reads across N shards at the cost of N-copies of storage and an N-fold invalidation broadcast on update. - In-process L1 cache — add a small in-process cache (Caffeine, a simple LRU dict) in front of Redis in each application instance. The hot key is served from local memory; Redis is consulted only on a local miss. A TTL of even 1 second absorbs a large fraction of requests on a high-traffic app server fleet.
- Read replicas for the cache — Redis Cluster does not route reads to replicas by default (
READONLYmode must be explicitly enabled per connection). Enabling replica reads spreads hot-key read load across the primary and its replicas. - Identify hot keys proactively — Redis provides
redis-cli --hotkeysand theMONITORcommand for identifying hot keys in production before they cause incidents.
import random
NUM_REPLICAS = 10
def get_hot_product(product_id):
# spread reads across N replica keys on different shards
replica_idx = random.randint(0, NUM_REPLICAS - 1)
key = f"product:{product_id}:r{replica_idx}"
return cache.get(key) or rebuild_and_fan_write(product_id)
def invalidate_hot_product(product_id):
# on update, invalidate all replica keys
keys = [f"product:{product_id}:r{i}" for i in range(NUM_REPLICAS)]
cache.delete(*keys)
Distributed Cache Consistency
A distributed cache introduces a second system of record: the cache and the database can hold different values for the same key at any point in time. Managing this divergence is one of the hardest problems in caching at scale. The core tension is between two approaches to invalidation:
- Delete-on-write (the safer choice) — when the database is updated, delete the cache entry. The next reader pays the miss penalty and re-populates with fresh data. Simple, correct under most failure modes. The only risk is that between the DB write and the cache delete, another reader might cache a stale value if the operations are not atomic.
- Update-on-write — when the database is updated, also update the cache entry. Avoids the miss penalty but introduces a race: if two writes happen concurrently, the second cache write might land before the first, leaving an older value in cache than in the DB.
Always write to the database first, then invalidate the cache — not the other way around. If you invalidate the cache first and then the DB write fails, you have a cold cache with a stale (or missing) DB value. If you write the DB first and the cache invalidation fails, your cache is briefly stale but the DB is authoritative; the TTL will eventually clean it up. The asymmetry matters.
For systems requiring stronger guarantees, consider event-driven invalidation via Change Data Capture (CDC): a CDC connector (Debezium, Maxwell) reads the database write-ahead log and publishes change events to a message queue. Cache workers subscribe and invalidate or refresh the cache entries. This completely decouples the write path from cache invalidation latency and ensures the cache is eventually consistent with the DB's WAL, which is the authoritative record of all writes.
Redis vs. Memcached: An Honest Comparison
Redis and Memcached are both in-memory key-value stores but they diverged substantially in their design philosophy and feature set. The decision is rarely close once you know your requirements.
| Dimension | Redis | Memcached |
|---|---|---|
| Data structures | Strings, hashes, lists, sets, sorted sets, HyperLogLog, streams, bitmaps, geospatial | Strings only (binary-safe) |
| Persistence | RDB snapshots + AOF write-ahead log; configurable durability vs performance tradeoff | None — memory-only; all data lost on restart |
| Clustering / HA | Native Redis Cluster (hash-slot partitioning) + Sentinel for automatic failover | Client-side consistent hashing; no server-side cluster |
| Pub/Sub | Yes — native publish/subscribe + streams (Kafka-lite for simple use cases) | No |
| Lua scripting | Yes — atomic server-side scripts; enables complex atomic operations | No |
| Transactions | MULTI/EXEC (optimistic) + WATCH for CAS | No |
| Threading model | Single-threaded event loop (network I/O multi-threaded since 6.0) | Multi-threaded — true CPU parallelism per connection |
| Memory efficiency | Rich structures add overhead; use hashes for small objects | Lean, minimal overhead for raw strings |
| Use cases | Sessions, leaderboards (sorted sets), rate limiting, queues, pub/sub, feature flags | Simple string caching at extreme scale, read-only object caches |
Redis is the default choice for most new systems. Memcached retains a meaningful edge in pure raw-string throughput on multi-core machines — its multi-threaded architecture allows true CPU-level parallelism per connection, whereas Redis's single-threaded command processing serializes all commands regardless of available cores (network I/O was multi-threaded in 6.0, but command execution remains single-threaded). For a workload that is literally millions of simple GET/SET on plain strings with no need for persistence, rich types, or pub/sub, Memcached can outperform Redis by 20–40% at extreme scale. For everything else — sorted set leaderboards, stream processing, Lua-atomic rate limiters, persistent sessions — Redis is unambiguously the better choice.
Redis supports multiple eviction policies configured via maxmemory-policy: noeviction (return errors when full), allkeys-lru (LRU across all keys), volatile-lru (LRU only among TTL-set keys), allkeys-lfu (LFU across all keys), allkeys-random, and volatile-ttl (evict keys with shortest TTL first). The most common production choice is allkeys-lru for a pure cache, and volatile-lru when Redis is dual-used as both a cache and a data store (only evict TTL-expiring cache entries, never the persistent data).
CDN Edge Caching
A Content Delivery Network is a globally distributed cache that sits at the network edge — geographically close to end users. CDNs are the highest-leverage cache you can deploy for public-facing content: instead of a request from Singapore traveling 200 ms to your US-East origin, it hits a CDN edge node in Singapore and returns in under 10 ms.
What to cache at the CDN
- Static assets — JavaScript, CSS, images, fonts. These are immutable with content-hashed filenames; set a
Cache-Control: max-age=31536000, immutableheader for a 1-year TTL with no invalidation needed. - Public API responses — product listings, category pages, public profiles. Cache with a short TTL (30–300 seconds) using
Cache-Control: public, max-age=60, s-maxage=300—s-maxagecontrols the CDN TTL independently from the browser TTL. - HTML pages — for non-personalized pages (landing pages, marketing content), caching at the CDN dramatically reduces origin load. Use
Vary: Accept-Encodingcarefully to avoid serving gzipped content to clients that cannot decompress it.
What not to cache at the CDN
- Authenticated or personalized responses — never cache responses that contain user-specific data behind a CDN; set
Cache-Control: privateorno-storeto prevent CDN caching and ensure privacy. - POST/PUT/DELETE requests — CDNs only cache GET and HEAD by default. Mutation requests must always pass through to origin.
- Payment and checkout flows — even with
GET, never cache financial data or security-sensitive responses.
Cache invalidation at the CDN
CDN invalidation (cache purge) is expensive in two ways: it has an API rate limit, and it takes seconds to propagate to all edge nodes globally. The standard strategy is to avoid needing invalidation for static assets (content-hash URLs) and to rely on short TTLs for semi-dynamic content. For urgent invalidation (a data breach requiring immediate removal), use the CDN's purge API but treat it as a rare emergency tool, not a routine cache-refresh mechanism.
Caching Patterns in Production: Real Scenarios
E-commerce product page
A product detail page aggregates data from multiple services: catalog, inventory, pricing, and reviews. The typical pattern is fragment caching: cache each fragment independently at its appropriate TTL rather than caching the assembled page. Product description: 1-hour TTL; pricing: 30-second TTL (or event-invalidated); inventory: not cached (always read from source). The CDN caches the assembled page at a 60-second TTL for anonymous users; personalized or authenticated responses bypass the CDN entirely.
Social feed (news feed / timeline)
User timelines are expensive to assemble from raw graph data. The industry standard (Facebook Memcache, Twitter Pelikan) is to cache pre-assembled timeline objects per user. Writes fan out to the timelines of all followers — the push-on-write model — so reads are always cache hits. For users with millions of followers (celebrities), the fan-out is unbounded; switch to a pull-on-read model for those accounts and merge their content into the timeline at read time.
Rate limiting with Redis
Redis sorted sets or atomic increments are the standard for distributed rate limiting. The sliding-window log pattern stores each request timestamp as a sorted set member; a ZRANGEBYSCORE counts requests in the last N seconds; ZADD adds the new request; ZREMRANGEBYSCORE prunes old entries — all in a Lua script for atomicity. The fixed-window counter is simpler: INCR key + EXPIRE key window_seconds.
-- Redis Lua: sliding-window rate limiter (atomic)
local key = KEYS[1] -- e.g. "ratelimit:user:42"
local now = tonumber(ARGV[1]) -- current timestamp ms
local window = tonumber(ARGV[2]) -- window size ms (e.g. 60000)
local limit = tonumber(ARGV[3]) -- max requests per window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count >= limit then
return 0 -- rate limited
end
redis.call('ZADD', key, now, now)
redis.call('PEXPIRE', key, window)
return 1 -- allowed
Cache Observability: What to Measure
A cache you cannot observe is a cache you cannot tune. The key metrics to instrument:
- Hit rate —
hits / (hits + misses). A healthy general-purpose cache should sustain 90%+ hit rates for hot data. A hit rate below 80% suggests either under-caching, a key design problem, or excessive churn from short TTLs. - Miss rate by key prefix — break down misses by key pattern to find which data types are not caching well. A high miss rate on
product:*but a low miss rate onsession:*points to a product caching problem, not a systemic one. - Eviction rate — if the eviction rate is high, your cache is too small relative to your working set. Either increase memory or reduce TTLs to shrink the working set.
- Memory usage and fragmentation — Redis
INFO memoryshowsused_memoryvsused_memory_rss; high fragmentation (ratio above 1.5) indicates memory waste that can be reclaimed by restarting the instance or runningMEMORY PURGE(Redis 4.0+). - P99 cache latency — Redis should serve commands in under 1 ms. Spikes above 5 ms indicate a hot-key problem, a large key (e.g., a huge serialized JSON), or a slow command (
KEYS,SMEMBERSon a large set). - Connection count — Redis is single-threaded; connection overhead is real. Use connection pooling and monitor active vs idle connections.
Best Practices and Anti-Patterns
Do
- Add TTL to every cache entry — no TTL means you are trusting your invalidation logic perfectly, which is never true.
- Use jitter on TTLs for batch-loaded data — eliminates synchronized expiry avalanches.
- Cache at the right granularity — cache the assembled object, not the raw DB row, so the cache hit is maximally useful. But if sub-objects have different freshness requirements, cache them separately.
- Serialize efficiently — JSON is convenient but verbose; use MessagePack or Protocol Buffers for high-volume cache values to reduce memory and deserialization time.
- Name keys with a consistent, hierarchical scheme —
service:entity:id:field— so pattern-based invalidation and monitoring work cleanly. - Test cache failure — periodically kill the cache in staging to verify the application degrades gracefully instead of cascading into a full outage.
Do Not
- Cache mutable financial data (balances, order status) unless you have ironclad invalidation — staleness here has direct monetary consequences.
- Use
KEYS *in Redis production — it is O(N) and blocks all other commands while executing. UseSCANinstead. - Store large values in Redis — a 1 MB value in a cluster is no problem, but a 100 MB value serialized as one key will cause latency spikes. Chunk large objects or store in an object store with a cache pointer.
- Treat the cache as the primary store — it is a fast accessory to the database, not a replacement. If the cache is empty, the application must still function correctly (perhaps slowly) by falling through to the DB.
- Forget about serialization versioning — if you change the schema of a cached object and redeploy, old cache entries will fail to deserialize. Use versioned key prefixes or carry a version field in the serialized payload.
Cache-aside is the safest starting point — it is simple, only caches what is actually needed, and degrades gracefully on cache failure. Add TTL jitter from day one to avoid avalanche. Understand your eviction policy relative to your access pattern — LRU for general workloads, LFU for stable hot-key sets, ARC when you cannot predict which. Reach for Redis when you need persistence, pub/sub, sorted sets, or atomic Lua scripts; consider Memcached only when raw string throughput at extreme scale is the sole requirement. Treat hot keys, stampede, penetration, and avalanche as distinct failure modes — each needs a specific mitigation, not a generic "add more cache" response.
Cache stampede vs cache avalanche — what's the difference? Stampede: one popular key expires and many threads hit the DB simultaneously — fix with mutex locking or probabilistic early refresh. Avalanche: many keys expire simultaneously sending a flood of misses — fix with TTL jitter and pre-warming. Distinct problems, distinct fixes.
How do you handle cache penetration? Cache negative results (a sentinel like "__null__" with a short TTL) or place a Bloom filter in front to reject nonexistent keys before they hit the cache or DB.
Redis or Memcached? Redis for anything beyond simple string caching — persistence, pub/sub, sorted sets for leaderboards, Lua-atomic rate limiters, and streams. Memcached only when raw string throughput at multi-core extreme scale is the sole requirement and ops simplicity is paramount.
What eviction policy do you use? allkeys-lru for a pure cache; volatile-lru if Redis is dual-purposed with persistent data. Switch to allkeys-lfu when the working set has stable, clearly defined hot keys (leaderboards, viral content).
How do you handle a hot key saturating one Redis shard? Replicate the key to N slots with a random suffix and fan out reads; add an in-process L1 cache with a 1-second TTL to absorb the bulk of requests before they even hit Redis.