INTERVIEW_QUESTIONS
Caching Interview Questions for Senior Engineers (2026)
Top caching interview questions with detailed answer frameworks covering cache strategies, eviction policies, distributed caching, cache invalidation, and performance optimization.
Why Caching Matters in Senior Engineering Interviews
Caching is one of the most impactful performance optimizations in system design, yet it introduces some of the most subtle bugs and consistency challenges in distributed systems. The famous quote, "There are only two hard things in Computer Science: cache invalidation and naming things," reflects the real difficulty of getting caching right.
Senior engineers are expected to understand not just that caching makes things faster, but how to choose the right caching strategy, how to handle invalidation correctly, how to size and monitor caches, and how to avoid the pitfalls that cause production incidents. At companies like Google, Meta, and Amazon, caching questions appear in almost every system design interview because caching decisions affect performance, consistency, cost, and reliability.
For foundational concepts, review our caching strategies guide and the system design interview guide. Explore our learning paths for hands-on preparation.
1. Explain the different caching strategies and when to use each
What the interviewer is really asking: Do you understand the architectural implications of each caching pattern, not just how they work?
Answer framework:
Cache-aside (lazy loading): the application checks the cache first. On a cache miss, it reads from the database, writes to the cache, and returns. The application manages all cache interactions. Most common pattern. Advantages: only requested data is cached, cache failure does not prevent reads. Disadvantages: cache miss penalty (three network trips: cache, database, cache write), stale data until TTL expires or explicit invalidation.
Read-through: the cache sits between the application and the database. On a miss, the cache itself fetches from the database. The application only talks to the cache. Advantage: simpler application code. Disadvantage: the cache must know how to fetch from the database (coupling). Used by Hibernate's second-level cache and CDNs.
Write-through: every write goes to the cache and the database synchronously. The cache is always consistent with the database. Advantage: reads never get stale data. Disadvantage: higher write latency (two writes per operation), cache may contain data that is never read (wastes cache space).
Write-behind (write-back): writes go to the cache immediately, and the cache asynchronously writes to the database. Advantage: very fast writes, the cache absorbs write spikes. Disadvantage: data loss risk if the cache crashes before flushing to the database, complexity of ensuring writes eventually reach the database. Used by operating system page caches and some database storage engines.
Write-around: writes go directly to the database, bypassing the cache. The cache is only populated on reads. Advantage: avoids polluting the cache with data that might not be read. Disadvantage: cache miss on the first read after every write.
Choosing the right strategy depends on your workload: read-heavy with infrequent updates (cache-aside with TTL), read-heavy with frequent updates (write-through), write-heavy (write-behind if you can tolerate potential data loss), and write-once-read-many (cache-aside or read-through).
Follow-up questions:
- How would you combine multiple caching strategies in the same system?
- What happens to write-behind cache data during a deployment?
- How does the caching strategy affect consistency guarantees?
2. How does cache invalidation work in a distributed system?
What the interviewer is really asking: Can you handle the hardest problem in caching without introducing subtle bugs?
Answer framework:
The core challenge: when the source data changes, the cached copy must be updated or removed. In a distributed system with multiple cache nodes and multiple application instances, this coordination is non-trivial.
TTL-based expiration: the simplest approach. Each cache entry has a time-to-live. After expiration, the next read triggers a cache miss and fresh data is loaded. Trade-off: during the TTL window, stale data is served. Short TTLs reduce staleness but increase cache miss rates and database load. Long TTLs improve hit rates but increase staleness.
Explicit invalidation: when data changes, the application explicitly deletes or updates the cache entry. More precise than TTL but requires discipline: every write path must include cache invalidation. A single missed invalidation causes persistent stale data. In microservices, the service that updates the database might not own the cache, requiring inter-service communication for invalidation.
Event-based invalidation: publish data change events to a message queue like Kafka. Cache invalidation consumers subscribe to these events and update or remove the affected cache entries. Advantage: decouples the write path from cache management. Disadvantage: eventual consistency (there is a window between the database write and the cache invalidation).
The race condition problem: consider this sequence: (1) cache entry expires, (2) thread A reads from database (gets value V1), (3) another process updates database to V2, (4) the update process invalidates the cache, (5) thread A writes V1 to cache. Now the cache has stale data V1 even though the database has V2. Solutions: use a versioned cache (include a version number and only accept writes with a version >= current), or use cache leasing (Facebook's approach in Memcache: the cache returns a lease token on a miss, the write must include the lease token, and the token is invalidated on cache invalidation).
For distributed caching: use a pub/sub mechanism to broadcast invalidation messages to all cache nodes. Redis Pub/Sub or dedicated invalidation channels in Kafka work well. Each node that holds a copy of the invalidated key removes it.
Follow-up questions:
- How does Facebook handle cache invalidation at scale (the memcache paper)?
- What is the thundering herd problem in cache invalidation and how do you solve it?
- How do you handle cache invalidation across multiple data centers?
3. How do you choose the right eviction policy for your cache?
What the interviewer is really asking: Do you understand the trade-offs between eviction algorithms and can you match them to workload characteristics?
Answer framework:
Eviction policies determine which entries to remove when the cache is full. The right choice depends on your access pattern.
LRU (Least Recently Used): evict the entry that was accessed least recently. Works well for workloads with temporal locality (recently accessed items are likely to be accessed again). Most common default. Implementation: a doubly linked list plus a hash map gives O(1) operations. Redis uses approximate LRU (sampling 5 random keys and evicting the least recently used among them) to avoid the memory overhead of exact LRU.
LFU (Least Frequently Used): evict the entry that was accessed least often. Works well for workloads with stable hot keys (some keys are consistently popular). Better than LRU when popular items are not always recently accessed. Implementation challenge: frequency counters grow unboundedly. Use count-min sketch for approximate frequency tracking, or periodically halve all counters (aging).
FIFO (First In, First Out): evict the oldest entry. Simple but ignores access patterns entirely. Can work for time-series data where older data is naturally less relevant.
TTL-based: entries expire after a set duration regardless of access frequency. Not really an eviction policy but often used alongside one. Useful for data that becomes stale after a known duration (session tokens, OAuth tokens, temporary computations).
Random: evict a random entry. Surprisingly effective for uniform access patterns and very cheap to implement. Used as a baseline for comparison.
Advanced policies: W-TinyLFU (used by Caffeine, the best-performing Java cache): a combination of a small admission window (LRU) and a large main space (segmented LRU). New entries go into the window. Only entries that are accessed frequently enough (tracked by a count-min sketch) are promoted to the main space. This protects against cache pollution from one-time scans while keeping frequently accessed items.
Choosing: if your access pattern has clear temporal locality (most web applications), use LRU. If you have a stable set of hot items (popular products, trending content), use LFU. If you want the best general-purpose performance, use W-TinyLFU (Caffeine). Profile your actual access patterns with cache hit rate monitoring before optimizing.
Follow-up questions:
- How does Redis implement approximate LRU?
- What is cache pollution and how does W-TinyLFU prevent it?
- How do you benchmark eviction policies for your specific workload?
4. How do you design a multi-layer caching architecture?
What the interviewer is really asking: Can you design a caching system that balances latency, cost, and consistency across multiple tiers?
Answer framework:
A multi-layer cache places faster, smaller caches close to the client and larger, slower caches closer to the data source. Each layer absorbs misses from the layer above it.
Layer 1 - Browser/client cache: HTTP cache headers (Cache-Control, ETag, Last-Modified). Zero network latency. Capacity limited by client device. Best for static assets (CSS, JS, images) and API responses that change infrequently. Use Cache-Control: max-age=3600, immutable for versioned assets.
Layer 2 - CDN cache: Content delivery network caches at edge locations worldwide. Sub-10ms latency for cached content. Massive capacity. Best for static content, media, and public API responses. Use Vary headers to ensure correct cache key construction.
Layer 3 - Application-level in-process cache: an in-memory cache within the application process (Caffeine for Java, lru-cache for Node.js). Sub-microsecond latency (no network hop). Limited by application memory. Best for hot data that is read on every request (configuration, feature flags, frequently accessed database objects). Challenge: each application instance has its own cache, so consistency across instances requires coordination.
Layer 4 - Distributed cache (Redis, Memcached): shared cache accessible by all application instances. Sub-millisecond latency over the network. Large capacity (hundreds of GB). Best for session data, computed results, database query results. This is the most important caching layer for most applications.
Layer 5 - Database cache: the database's own buffer pool (PostgreSQL shared_buffers, MySQL InnoDB buffer pool). Caches data and index pages in memory. Managed by the database engine. Ensure the buffer pool is large enough to hold the working set.
Consistency across layers: invalidation must propagate through all layers. When data changes: invalidate the distributed cache, CDN (purge API), and application-level cache. Client caches rely on TTL or ETag revalidation. Using short TTLs for the upper layers and event-based invalidation for the lower layers balances consistency with performance.
Sizing each layer: use the Pareto principle as a starting point. If 20% of your data serves 80% of requests, the application cache should hold the top 1-5% (extremely hot data), the distributed cache should hold the top 20-30%, and the CDN handles static content. Monitor hit rates per layer and adjust.
Follow-up questions:
- How do you prevent the cache layers from amplifying a thundering herd?
- How do you monitor cache performance across all layers?
- When would you skip a layer entirely?
5. How does consistent hashing work in distributed caches?
What the interviewer is really asking: Do you understand the data distribution mechanism that makes distributed caches scalable?
Answer framework:
The problem: distributing cache keys across N nodes. Naive approach: hash(key) mod N. This works until you add or remove a node: when N changes, most keys map to a different node, causing a massive cache miss storm (all keys rehash simultaneously).
Consistent hashing solves this: map both keys and nodes onto a circular hash space (ring). Each key is assigned to the first node encountered clockwise on the ring. When a node is added, only keys between the new node and the previous node (counterclockwise) move. When a node is removed, only its keys move to the next node clockwise. On average, only K/N keys are remapped (K = total keys, N = total nodes).
Virtual nodes: with only N physical nodes on the ring, the distribution can be uneven (one node might own a larger arc of the ring). Virtual nodes solve this: each physical node maps to 100-200 virtual nodes distributed around the ring. This ensures even key distribution. It also allows heterogeneous nodes: a more powerful machine can have more virtual nodes.
Implementation in Redis Cluster: Redis uses a hash slot approach (16384 slots). Each key hashes to a slot. Slots are assigned to nodes. When adding a node, some slots are migrated. This is consistent hashing with a fixed number of virtual nodes (slots). The cluster handles slot migration transparently.
Implementation in Memcached: the client library implements consistent hashing (ketama algorithm). The client hashes the key, finds the responsible node, and connects directly. No coordination between Memcached nodes (they are unaware of each other).
Replication in consistent hashing: for fault tolerance, each key is replicated to the next N-1 nodes clockwise on the ring (where N is the replication factor). If a node fails, replicas serve the keys. This is how DynamoDB and Cassandra implement their storage.
Follow-up questions:
- What happens during a consistent hashing ring rebalance and how does it affect latency?
- How do you handle hot keys that all map to the same node?
- What is the jump consistent hashing algorithm and when is it better?
6. How do you handle the thundering herd problem?
What the interviewer is really asking: Do you understand this critical cache failure mode and the various mitigation strategies?
Answer framework:
The thundering herd (also called cache stampede): when a popular cache entry expires, hundreds or thousands of requests simultaneously miss the cache and hit the database. This can overwhelm the database, causing a cascading failure.
Mutex/lock approach: when a cache miss occurs, the first request acquires a lock (using Redis SETNX or a similar mechanism), fetches from the database, populates the cache, and releases the lock. Subsequent requests see the lock, wait briefly, and then read from the (now populated) cache. Trade-off: adds latency for the waiting requests and introduces lock management complexity.
Probabilistic early expiration: instead of all entries expiring at exactly the same time, add random jitter to the TTL. An entry with a base TTL of 60 seconds might actually expire at 55-65 seconds. This spreads out the cache misses over time. Additionally, proactively refresh entries before they expire: if the remaining TTL is less than a threshold, the request triggers an async background refresh while still serving the stale value.
Stale-while-revalidate: continue serving the stale cached value while asynchronously refreshing it. The first request after TTL triggers the background refresh. All subsequent requests get the stale value until the refresh completes. This eliminates the thundering herd entirely because the cache never truly empties. HTTP supports this natively with Cache-Control: stale-while-revalidate.
Pre-computed cache warming: for predictable high-traffic events (product launches, marketing campaigns), proactively populate the cache before the traffic hits. A background job periodically refreshes hot keys before they expire.
Request coalescing: at the load balancer or proxy level, detect multiple identical requests and send only one to the backend. The response is shared with all waiting requests. This works well for CDNs and API gateways.
For distributed caching specifically: the lock must be distributed (not local), adding latency. Consider the tradeoff between lock-based approaches (add latency for some requests but prevent all database hits) and probabilistic approaches (simpler but less precise).
Follow-up questions:
- How does request coalescing work at the CDN level?
- How do you handle the thundering herd for cache entries with different popularity levels?
- What metrics would you monitor to detect thundering herd events?
7. How do you cache effectively in a microservices architecture?
What the interviewer is really asking: Do you understand the unique caching challenges that arise when data is distributed across multiple services?
Answer framework:
The ownership challenge: in microservices, each service owns its data. When Service A caches data from Service B, who is responsible for invalidating A's cache when B's data changes? If B does not know that A is caching its data, A's cache can become stale indefinitely.
Event-driven cache invalidation: Service B publishes data change events to Kafka. Service A subscribes and invalidates or updates its local cache. This is the cleanest approach: B does not need to know about A's cache, and A is responsible for its own cache freshness. The trade-off is the latency of event propagation (typically seconds).
Cache at the API gateway: the API gateway caches responses from backend services. This is transparent to the services and works well for read-heavy, infrequently changing data. Use HTTP cache headers (Cache-Control, ETag) so services control their own cacheability. The gateway respects these headers without needing service-specific logic.
Embedded caching (in-process): each service instance caches hot data in memory. Fastest access (no network hop) but limited capacity and potential for inconsistency across instances. Use for: configuration data, reference data (country codes, currency rates), and frequently read data with tolerance for staleness.
Shared distributed cache: a centralized Redis cluster that all services use. Clear ownership: each service uses a namespace or prefix for its keys. Advantages: shared infrastructure, large capacity, consistent view across service instances. Disadvantages: the cache becomes a shared dependency (its failure affects all services), network latency on every cache access.
Hybrid approach (recommended): use in-process cache for extremely hot, small, read-mostly data (seconds TTL). Use distributed cache for larger, shared data (minutes TTL). Use event-driven invalidation for data that must be consistent across services. This layered approach balances performance, consistency, and operational complexity.
Discuss cache-related failure modes: if the distributed cache fails, all services suddenly hit their databases simultaneously, potentially causing a cascading failure. Implement circuit breakers around cache calls: if the cache is unreliable, degrade to direct database access with rate limiting to protect the database.
Follow-up questions:
- How do you handle cache warming when deploying a new service instance?
- How do you monitor cache effectiveness across multiple services?
- What happens when the shared cache becomes a performance bottleneck?
8. How do you decide what to cache and what not to cache?
What the interviewer is really asking: Can you apply caching strategically rather than indiscriminately?
Answer framework:
Good candidates for caching: data that is read much more often than written (high read:write ratio), data that is expensive to compute or fetch (complex queries, aggregations, external API calls), data that does not change frequently (user profiles, product catalogs, configuration), data that does not need to be perfectly consistent (a few seconds of staleness is acceptable), and data with predictable access patterns (a small set of hot items serves most requests).
Poor candidates for caching: data that changes with every access (nonces, one-time passwords), data that requires real-time consistency (account balances during a transaction, inventory during checkout), data with a very long tail distribution (every key is accessed equally rarely, so the cache hit rate is low regardless of size), write-heavy data where cache invalidation cost exceeds the read benefit, and data with security implications (caching can leak data across users or tenants).
Quantitative analysis: before caching, measure. What is the current latency of the uncached path? What is the expected cache hit rate? What is the cost of staleness? A cache that has a 10% hit rate is not worth the complexity. A cache that reduces p99 latency from 500ms to 5ms at a 95% hit rate is transformative.
Cache sizing: use the working set principle. If 20% of your data serves 80% of requests, your cache needs to hold that 20%. Monitor the cache hit rate as you adjust size: a hit rate below 80% suggests the cache is too small or the access pattern is not cache-friendly.
Cost analysis: caching is not free. Redis costs money (memory is expensive). Cache infrastructure needs monitoring, alerting, and operational support. Stale data causes bugs. The benefit of caching must outweigh these costs.
Discuss database design alternatives: sometimes optimizing the database query is better than caching the result. Adding an index, denormalizing a table, or using a materialized view can eliminate the need for an application-level cache entirely.
Follow-up questions:
- How do you measure cache hit rate and what is a good target?
- How do you handle the transition when removing a cache that was previously relied upon?
- How do you cache personalized content effectively?
9. Explain how Redis works and its role in caching architectures
What the interviewer is really asking: Do you have practical experience with the most popular caching technology and understand its architecture?
Answer framework:
Redis is an in-memory data structure server. It stores data in memory for fast access and optionally persists to disk. Single-threaded event loop (no context switching, no locks) achieves 100K+ operations per second.
Data structures beyond simple key-value: Strings (basic caching, counters with INCR), Hashes (object-like structures, partial updates), Lists (queues, recent items), Sets (unique collections, set operations), Sorted Sets (leaderboards, priority queues, time-series), Streams (event log, similar to Kafka topics), and HyperLogLog (approximate cardinality counting with minimal memory).
Persistence options: RDB (point-in-time snapshots at intervals, fast restart but data loss window), AOF (append-only file logging every write, less data loss but slower restart), and both (use AOF for durability, RDB for fast restart fallback). For pure caching, persistence can be disabled entirely.
Redis Cluster: distributes data across multiple nodes using hash slots (16384 slots). Each key maps to a slot, and slots are assigned to nodes. Provides horizontal scaling for both memory and throughput. Automatic failover: each master has one or more replicas. If a master fails, a replica is promoted. Supports multi-key operations only when all keys are on the same slot (use hash tags to co-locate related keys).
Redis Sentinel: provides high availability for non-clustered Redis. Monitors the master, detects failure, and promotes a replica to master. Handles notification of clients about the new master address.
Common Redis pitfalls: memory fragmentation (Redis allocates and frees memory frequently, leading to fragmentation. Use jemalloc and monitor memory usage vs allocated bytes), hot keys (a single key accessed by millions of requests per second can overwhelm a single Redis node. Solutions: replicate hot keys across replicas, or use a local cache in front of Redis), and blocking operations (KEYS command scans the entire keyspace, blocking all other operations. Use SCAN instead).
Redis vs Memcached: Redis is richer (more data structures, persistence, clustering, pub/sub). Memcached is simpler and slightly faster for basic key-value caching. For most applications, Redis is the better choice. Memcached is preferred for very simple caching where the additional features of Redis are not needed.
Follow-up questions:
- How would you migrate from a single Redis instance to Redis Cluster?
- How do you handle Redis memory limits when data grows beyond available RAM?
- When would you choose Memcached over Redis?
10. How do you implement cache warming and pre-loading?
What the interviewer is really asking: Do you think about the cache lifecycle holistically, including the cold start problem?
Answer framework:
The cold cache problem: when a cache is empty (after a restart, deployment, or new instance scaling up), all requests miss the cache and hit the database. This sudden load can overwhelm the database and cause slow responses for users.
Proactive warming strategies:
On-deployment warming: before routing traffic to a new instance, run a warming script that loads hot data into the cache. Identify hot keys from analytics (most accessed keys in the last hour or day). Load them from the database into the cache. Only route traffic to the instance after warming completes.
Shadow traffic warming: for new cache instances, replay a sample of production traffic without serving the responses. The cache populates naturally from the replayed requests. Used by CDNs and cache tiers.
Peer warming: when a new cache node joins a distributed cache, it can copy data from existing nodes rather than loading everything from the database. Redis Cluster does this automatically during slot migration.
Predictive warming: for predictable traffic patterns (daily peaks, scheduled promotions, sports events), pre-load relevant data before the traffic arrives. A batch job runs before the expected peak and populates the cache with data that will be needed.
Gradual traffic shifting: instead of sending full traffic to a cold cache immediately, ramp up traffic gradually (1%, 10%, 50%, 100%). The cache warms naturally under controlled load. This is similar to canary deployment but for cache state.
Reactive strategies (for when you cannot warm proactively):
Request coalescing: when multiple requests miss the cache for the same key, only one request fetches from the database. Others wait for the result. Prevents the thundering herd during cache warming.
Database connection limiting: limit the number of concurrent database queries during cache misses. Queue excess requests and serve them as the cache fills. This protects the database at the cost of higher latency during warming.
Monitoring during warming: track cache hit rate, database QPS, and response latency during warming. Alert if the database load exceeds safe thresholds. Have a runbook for slowing or pausing warming if the database is struggling.
Follow-up questions:
- How would you warm a cache with billions of keys?
- How do you handle cache warming for personalized data?
- What is the impact of cache warming on deployment time?
11. How do you handle caching for write-heavy workloads?
What the interviewer is really asking: Can you apply caching in scenarios beyond the typical read-heavy use case?
Answer framework:
Write-heavy workloads seem like poor candidates for caching because the cache is constantly invalidated. But there are effective strategies.
Write-behind buffering: absorb bursts of writes in the cache and flush to the database asynchronously. The cache acts as a write buffer. This smooths out write spikes and reduces database IOPS. Redis with AOF persistence can serve this role. Risk: data loss if the cache crashes before flushing. Mitigate with replication and appropriate persistence settings.
Coalescing writes: if the same key is written multiple times in quick succession, the cache can coalesce them into a single database write. Example: a counter incremented 1000 times per second. Instead of 1000 database writes, the cache accumulates increments and flushes the total periodically (every second or every 1000 increments). Use Redis INCR for atomic in-memory increments.
Write-through with batch flushing: writes go to the cache immediately (for fast reads) and are batched for database writes (every 100ms or every 100 entries). This gives the read performance of write-through with reduced database write load.
Sharded counters: for high-contention counters (view counts, like counts), split the counter across multiple cache keys (counter:1, counter:2, ..., counter:N). Each write goes to a random shard. Reads sum all shards. This reduces contention on any single key.
Event aggregation: instead of caching individual writes, cache aggregated results. Example: instead of caching each page view event, maintain a running count in the cache. Flush the count to the analytics database periodically.
Discuss the consistency trade-off: all write-caching strategies introduce a window where the database does not have the latest data. If the cache fails during this window, data is lost. The trade-off between write performance and durability must align with business requirements. Use replication and persistence to minimize the risk.
Follow-up questions:
- How do you ensure data durability in a write-behind cache?
- How does the write-behind pattern interact with database replication?
- When is write-behind caching appropriate for financial data?
12. How do you monitor and troubleshoot cache performance?
What the interviewer is really asking: Do you have operational experience with caches in production?
Answer framework:
Key metrics to monitor:
Hit rate: the most important cache metric. (hits / (hits + misses)) * 100. A healthy cache hit rate is above 90% for most applications. Below 80% suggests the cache is too small, the TTL is too short, or the access pattern is not cache-friendly. Track per cache instance and per key prefix.*
Latency: p50, p95, p99 for both cache hits and cache misses. A cache hit should be under 1ms for network cache, under 100 microseconds for in-process cache. Cache miss latency includes the database fetch time. Monitor for latency degradation that might indicate network issues or cache contention.
Memory usage: current memory vs maximum memory. Eviction rate (entries evicted per second). If evictions are high, the cache is under-provisioned. If memory usage is low, the cache might be over-provisioned (wasting resources).
Connection count: current connections vs maximum connections. Connection churn rate. High churn suggests connection pooling issues in the application.
Key metrics for Redis specifically: info memory (fragmentation ratio, used memory vs RSS), info stats (keyspace hits, keyspace misses, evicted keys), info clients (connected clients, blocked clients), and slowlog (commands that took longer than a threshold).
Troubleshooting common issues:
Low hit rate: analyze access patterns. Are you caching the right data? Is the TTL too short? Is the cache too small? Use Redis key analysis tools (redis-cli --bigkeys, redis-cli --memkeys) to understand what is in the cache.
High latency: check network latency between application and cache. Check if slow commands are blocking (use slowlog). Check if the cache is swapping to disk (memory pressure).
Hot keys: a single key receiving disproportionate traffic. Identify with MONITOR command (use briefly, it is expensive) or client-side tracking. Solutions: replicate hot keys to read replicas, use local in-process cache for hot keys, or split the hot key into multiple keys.
Memory fragmentation: the ratio of RSS to used memory. If greater than 1.5, Redis is using significantly more memory than needed for the data. Restart Redis to defragment, or enable active defragmentation (CONFIG SET activedefrag yes in Redis 4+).
Follow-up questions:
- How do you create alerts for cache health without alert fatigue?
- How do you debug a cache stampede in production?
- How do you capacity plan for cache infrastructure?
13. How do you handle caching for personalized content?
What the interviewer is really asking: Can you apply caching to the hardest caching use case, where every user sees different content?
Answer framework:
Personalized content (news feeds, product recommendations, dynamic pricing) is the hardest to cache because each user has a unique view. Naive approaches (cache per user) often have low hit rates and high memory usage.
Decomposition strategy: break the page into cacheable and non-cacheable components. Static content (header, footer, navigation, CSS, images) is cached at the CDN. Shared dynamic content (trending topics, popular products, editorial content) is cached in the distributed cache. Personalized content (recommendations, feed, notifications) is either served fresh or cached per user.
Per-user caching: cache the personalized result with a key that includes the user ID (user:123:recommendations). This works when: the number of active users is manageable (millions, not billions), personalized data changes infrequently (recommendations update every few hours), and the personalized data is expensive to compute. Pre-compute recommendations in batch and cache the results.
Segment-based caching: instead of caching per user, cache per user segment. Users with similar profiles see the same cached result. Example: cache recommendations by (age group, location, interest category) rather than by individual user. This dramatically reduces the number of cache entries while providing "good enough" personalization. Used by many e-commerce platforms.
Edge-side includes (ESI): the CDN assembles pages from multiple fragments. Static fragments are cached at the edge. Dynamic fragments are fetched from the origin. The CDN combines them into the final response. This moves composition logic to the CDN, reducing origin load.
Client-side personalization: serve a generic cached page and personalize on the client using JavaScript. The generic page is highly cacheable (CDN, browser). The personalization data is fetched via a separate API call (small payload, per-user cache). This maximizes cacheability while still providing personalization. Trade-off: slower personalization (requires JS execution and an API call).
For scalability: combine segment caching (for the majority of users) with per-user caching (for high-value users or recently active users). This optimizes memory usage while providing the best experience for the most important users.
Follow-up questions:
- How do you invalidate personalized caches when user preferences change?
- How do you cache search results that depend on user context?
- What is the memory overhead of per-user caching at 100M users?
14. How do you design a cache for high availability?
What the interviewer is really asking: Can you design a caching system that survives failures without causing cascading problems?
Answer framework:
Cache replication: Redis supports master-replica replication. The master handles writes, replicas handle reads. If the master fails, a replica is promoted. Use Redis Sentinel or Redis Cluster for automatic failover. Configure at least one replica per master.
Redis Cluster for high availability: distributes data across multiple masters, each with one or more replicas. If a master fails, its replica takes over. The cluster continues operating as long as a majority of masters are available and each failed master has at least one surviving replica.
Multi-region caching: for global applications, deploy cache clusters in each region. Options: independent caches per region (each region warms independently, no cross-region latency, but data might differ between regions) or replicated caches (one primary region writes, other regions have replicas. Cross-region replication lag means non-primary regions might serve slightly stale data).
Graceful degradation: when the cache is unavailable, the system must continue functioning (slower, but functional). Implement circuit breakers around cache calls. If the cache is down, fall back to the database. But protect the database from the sudden load: use connection limiting, request queuing, and rate limiting. Consider serving stale cached data from a local in-process cache as a last resort.
Split cache approach: do not put all data in one cache cluster. Split by domain (user cache, product cache, session cache). This limits the blast radius of a cache failure: if the product cache fails, user sessions and authentication still work.
Persistence for recovery: enable Redis persistence (RDB + AOF) so that a restarted cache instance does not start cold. The restart loads data from disk, providing immediate cache hits. This significantly reduces recovery time after a failure.
Discuss the anti-pattern: treating the cache as a primary data store. If losing the cache means losing data, you have a durability problem, not a caching problem. The cache should always be rebuildable from the source of truth.
Follow-up questions:
- How do you handle a regional cache failure in a multi-region deployment?
- What is the failover time for Redis Sentinel vs Redis Cluster?
- How do you test cache failure scenarios in a staging environment?
15. How do you implement cache-aside with strong consistency guarantees?
What the interviewer is really asking: Can you solve the most common caching consistency problem in real systems?
Answer framework:
The standard cache-aside pattern has a consistency gap: between the database write and the cache invalidation, stale data can be served. For most applications, this is acceptable. For some (inventory, account balances), it is not.
The classic race condition: Thread A reads from cache (miss), reads from DB (value V1). Meanwhile, Thread B updates DB to V2 and deletes cache. Thread A writes V1 to cache. Now the cache has V1, but the DB has V2. The cache is stale and will remain stale until TTL expires.
Solution 1 - Delete, never update: instead of updating the cache on writes, always delete the cache entry. The next read will miss and load the fresh value. This avoids the race where an update writes a stale value. But the race described above (read-miss coinciding with a write) can still occur.
Solution 2 - Versioned cache entries: include a version number (or timestamp) with each cache entry. When writing to the cache, use a compare-and-set (CAS) operation that only writes if the version is newer than what is currently cached. Redis supports this with WATCH/MULTI/EXEC or Lua scripts.
Solution 3 - Facebook's lease approach: when a cache miss occurs, the cache server issues a lease token. The reader uses this token when setting the value. If the entry was invalidated between the miss and the set (because a write occurred), the lease token is invalidated, and the set is rejected. This prevents the stale write from succeeding.
Solution 4 - Write-through with locks: on a write, lock the cache key, update the database, update the cache, unlock. On a read miss, lock the cache key, read from the database, update the cache, unlock. The lock prevents concurrent reads and writes from conflicting. Trade-off: lock contention adds latency.
Solution 5 - Change data capture: use database CDC (Debezium) to capture all changes from the database's transaction log. A consumer process updates the cache based on the CDC stream. This ensures the cache reflects every database change and handles the ordering correctly (CDC preserves the transaction log order). See our event-driven architecture guide for implementation details.
Practical recommendation: for most systems, use delete-on-write with short TTL. For systems requiring strong consistency, use CDC-based cache updates. The CDC approach is the most reliable because it is driven by the database's own transaction log, eliminating application-level race conditions.
Follow-up questions:
- How does Facebook's memcache lease mechanism work in detail?
- What is the latency overhead of versioned cache entries?
- How does CDC-based cache invalidation handle database failover?
Common Mistakes in Caching Interviews
-
Treating cache as the source of truth. The cache should always be rebuildable from the database. If losing the cache means losing data, the architecture is wrong.
-
Not discussing invalidation. Adding a cache without a clear invalidation strategy is a recipe for stale data bugs that are difficult to diagnose.
-
Using the same TTL for everything. Different data types have different staleness tolerances. Configuration data can have a TTL of minutes, user sessions need seconds, and content can tolerate hours.
-
Ignoring failure modes. What happens when the cache is down? If the answer is "the system fails," you have created a single point of failure.
-
Not considering memory costs. Caching everything is expensive. Be selective about what to cache based on access frequency and computation cost.
How to Prepare for Caching Interviews
Study the Facebook memcache paper, which is the definitive resource on distributed caching at scale. Understand how Redis works internally (single-threaded event loop, data structures, persistence, clustering).
Practice designing caching strategies for different scenarios: read-heavy API, write-heavy analytics, personalized content, multi-region deployment. For each, discuss the caching strategy, invalidation approach, failure handling, and monitoring.
Review our caching strategies guide, Redis deep dive, and the system design interview guide. Explore learning paths for structured preparation. For staff-level roles, see the senior to staff engineer transition.
Related Resources
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.