INTERVIEW_QUESTIONS

Redis Interview Questions for Senior Engineers (2026)

Top Redis interview questions with detailed answer frameworks covering data structures, persistence, clustering, caching strategies, and production patterns used at companies like Google, Amazon, and Uber.

20 min readUpdated Apr 21, 2026
interview-questionsrediscachingsenior-engineerdistributed-systems

Why Redis Expertise Matters in Senior Engineering Interviews

Redis has evolved from a simple caching layer into a versatile data structure server that underpins critical infrastructure at virtually every large-scale technology company. Senior engineers are expected to understand Redis not just as a key-value cache but as a tool for distributed locking, rate limiting, real-time analytics, session management, message brokering, and leaderboard systems. Interviewers probe for this depth because Redis decisions directly affect system reliability, latency, and cost.

At the senior level, interviewers want to hear you articulate the precise trade-offs of Redis persistence strategies, explain why Redis Cluster's hash slot design matters for your data model, and describe how you have handled Redis memory pressure in production. They expect you to know when Redis is the right tool and when it is not, demonstrating the judgment that distinguishes a senior engineer from someone who uses Redis because it is fast without understanding its limitations.

The questions in this guide reflect what candidates encounter at companies like Google, Amazon, and other top-tier organizations. Each answer framework provides the structure to demonstrate senior-level reasoning: articulate the mechanism, explain the trade-offs, reference production experience, and address failure modes. For a broader preparation strategy, see our system design interview guide and explore the learning paths tailored to senior engineers. For a deep dive into Redis internals, see how Redis works.

1. Explain Redis's single-threaded architecture and why it is still fast.

What the interviewer is really asking: Do you understand the fundamental design decision behind Redis, why it works for most workloads, and what its actual limitations are?

Answer framework:

Redis processes commands using a single main thread for all data operations. This is a deliberate design choice, not a limitation. The single-threaded model eliminates the need for locks, mutexes, and synchronization between threads when accessing data structures. Every operation is atomic by default because nothing else is modifying the data concurrently. This simplicity is what makes Redis predictable: you can reason about operation ordering without worrying about race conditions.

Redis achieves high throughput despite being single-threaded because of several factors. First, all data resides in memory. Memory access latency is roughly 100 nanoseconds, compared to 10 milliseconds for a disk seek on an HDD or 100 microseconds on an SSD. This means Redis can execute millions of simple operations per second because the bottleneck is never waiting for I/O on the data path.

Second, Redis uses I/O multiplexing (epoll on Linux, kqueue on macOS) to handle thousands of client connections on a single thread. Instead of spawning a thread per connection, Redis registers all client sockets with the kernel and is notified when any socket has data ready to read. This is the same pattern that Nginx and Node.js use. The event loop reads a command from a ready socket, executes it (which is fast because the data is in memory), writes the response, and moves to the next ready socket.

Third, Redis's protocol (RESP) is simple and efficient to parse. Commands are structured as arrays of bulk strings with length prefixes, so parsing never requires backtracking or complex state machines.

Since Redis 6.0, I/O threading was introduced for reading client requests and writing responses. The main thread still executes all commands single-threaded, but the I/O work of parsing requests and serializing responses can be offloaded to multiple threads. This improves throughput for workloads that are I/O bound (large payloads, many connections) without changing the single-threaded execution model for commands.

The actual limitation of single-threaded execution is that a slow command blocks everything. A KEYS * command on a database with millions of keys, an LRANGE on a list with millions of elements, or a Lua script that runs for seconds will block all other clients. This is why understanding command time complexity and avoiding O(N) operations on large datasets in production is critical. Use SCAN instead of KEYS, paginate large collections, and set time limits on Lua scripts.*

For a detailed exploration of Redis internals, see how Redis works.

Follow-up questions:

  • How would you diagnose a Redis instance where latency suddenly spiked for all clients?
  • What is the maximum throughput you have achieved with a single Redis instance, and what was the bottleneck?
  • How does Redis 7's multi-threaded I/O compare to the single-threaded model in terms of latency percentiles?

2. Compare Redis persistence options: RDB snapshots versus AOF. When would you use each?

What the interviewer is really asking: Do you understand the durability trade-offs of each persistence mechanism, including their impact on performance and recovery time, well enough to make the right choice for different workloads?

Answer framework:

Redis offers two persistence mechanisms that serve different durability and performance requirements.

RDB (Redis Database) persistence creates point-in-time snapshots of the entire dataset at configured intervals (for example, save after 900 seconds if at least 1 key changed). The snapshot is generated by forking the Redis process. The child process writes the entire dataset to a temporary file, then atomically replaces the old RDB file. During the snapshot, the parent continues serving requests. Copy-on-write semantics mean that the child shares memory pages with the parent until either modifies a page, at which point the kernel copies it.

RDB advantages: compact single-file format that is easy to back up and transfer, faster restart time because loading a binary snapshot is faster than replaying a log, and no ongoing I/O impact during normal operations (I/O happens only during the periodic snapshot). RDB disadvantages: you lose all data written since the last snapshot if Redis crashes. With a 5-minute snapshot interval, you could lose up to 5 minutes of data. The fork operation can cause a latency spike, especially with large datasets on systems with limited memory, because the kernel must copy the page table (which can take hundreds of milliseconds for a 50GB dataset).

AOF (Append Only File) persistence logs every write command to a file. When Redis restarts, it replays the AOF to reconstruct the dataset. AOF offers three fsync policies: always (fsync after every command, safest but slowest, roughly 200 ops/sec throughput), everysec (fsync once per second, good balance, at most 1 second of data loss), and no (let the OS decide when to flush, fastest but least safe).

AOF advantages: configurable durability down to every command, and the AOF is a human-readable log of operations that can be useful for debugging. AOF disadvantages: the file grows over time (larger than RDB for the same dataset), and replaying a large AOF on restart is slower than loading an RDB snapshot. AOF rewriting (background process that compacts the AOF by generating the minimal set of commands to recreate the current dataset) mitigates file growth but consumes CPU and memory during the rewrite.

Since Redis 7.0, the recommended approach is to use both RDB and AOF together, which Redis calls "hybrid persistence." The AOF rewrite process generates an RDB-format preamble followed by AOF commands that occurred during the rewrite. This combines RDB's fast loading with AOF's durability. On restart, Redis loads the RDB preamble quickly, then replays only the recent AOF tail.

For a pure caching layer where data loss is acceptable (you can always refill from the source of truth), use RDB-only or no persistence at all. For session stores or any data that would be expensive to regenerate, use AOF with everysec fsync. For financial or transactional data where you cannot afford any loss, use AOF with always fsync and accept the throughput limitation, or consider whether Redis is the right tool for that use case.

For caching architecture decisions, see the distributed cache with Redis system design.

Follow-up questions:

  • How does the fork-based RDB snapshot interact with container memory limits in Kubernetes?
  • What happens if the AOF file becomes corrupted midway through?
  • How would you handle persistence for a Redis instance with a 100GB dataset?

3. How does Redis Cluster work, and what are its limitations?

What the interviewer is really asking: Do you understand Redis Cluster's hash slot mechanism, resharding process, replication model, and the specific constraints it imposes on multi-key operations?

Answer framework:

Redis Cluster provides horizontal scaling by partitioning data across multiple Redis nodes. The key space is divided into 16,384 hash slots. Each key is mapped to a slot using CRC16(key) mod 16384. Each master node in the cluster owns a subset of these hash slots. For example, in a 3-node cluster, node A might own slots 0-5460, node B owns 5461-10922, and node C owns 10923-16383.

When a client sends a command, the receiving node checks whether it owns the hash slot for the requested key. If it does, it executes the command. If not, it responds with a MOVED redirect telling the client which node owns that slot. Smart clients (like redis-py-cluster, Jedis, and ioredis) cache the slot-to-node mapping and route commands directly to the correct node, avoiding most redirects.

Redis Cluster uses asynchronous replication for high availability. Each master can have one or more replicas. If a master fails, the cluster promotes a replica to master through a consensus mechanism among the remaining masters. The failure detection uses a gossip protocol: nodes exchange heartbeat messages, and if a majority of masters mark a node as unreachable (after the cluster-node-timeout period, typically 15 seconds), it is considered failed. The failover happens automatically, but there is a brief window of unavailability for the slots owned by the failed master.

The most significant limitation is the multi-key operation constraint. Commands that operate on multiple keys (MGET, MSET, SUNION, pipeline with multiple keys) only work if all involved keys map to the same hash slot. Redis provides hash tags for this purpose: if a key contains a substring in curly braces like user:{1234}:profile and user:{1234}:settings, only the content inside the braces ({1234}) is hashed. This guarantees both keys land on the same slot. However, this creates a trade-off: concentrating related keys on one node limits the scalability benefit of clustering for those keys.

Resharding (moving hash slots between nodes) is an online operation but has implications. During slot migration, keys in the migrating slot may be on the source or destination node. Redis handles this with ASK redirects that tell the client to try the destination node for that specific request. The migration process moves keys one at a time using MIGRATE, which is atomic per key but not atomic for the entire slot. Resharding a large slot with millions of keys takes time and generates network traffic.

Another limitation is that Redis Cluster does not support multiple databases (only database 0) and does not support SELECT. Lua scripts and transactions (MULTI/EXEC) work but only for keys in the same hash slot. The pub/sub model in Redis Cluster is different: published messages are broadcast to all nodes, which means pub/sub traffic scales with the number of nodes, not the number of subscribers.

For a detailed comparison of Redis Cluster with alternatives, see Redis vs Memcached and the distributed cache design.

Follow-up questions:

  • How would you handle a scenario where you need to perform a transaction across keys on different hash slots?
  • What happens during a network partition that splits the cluster into two halves?
  • How do you monitor Redis Cluster health and detect slot coverage issues?

4. What are the common caching patterns, and when would you use each with Redis?

What the interviewer is really asking: Beyond knowing that Redis is used for caching, can you articulate the specific caching strategies, their consistency implications, and how to choose between them for different data access patterns?

Answer framework:

The choice of caching pattern determines the consistency, performance, and complexity of your system. There are four primary patterns, each with distinct characteristics.

Cache-aside (lazy loading) is the most common pattern. The application checks the cache first. On a cache hit, it returns the cached value. On a cache miss, it reads from the database, writes the result to the cache, and returns it. The application owns all caching logic. Advantages: only requested data is cached (no wasted cache space), and the cache naturally adapts to shifting access patterns. Disadvantages: the first request for any data always hits the database (cold start), and stale data persists until TTL expiration. This pattern works well for read-heavy workloads with tolerance for brief staleness, like product catalog pages or user profiles.

Write-through caching writes data to both the cache and the database on every write. The write is only acknowledged after both succeed. Advantages: the cache always has the latest data, eliminating staleness. Disadvantages: every write has the latency of both the cache write and the database write, and data that is never read still occupies cache space. This pattern suits workloads where read-after-write consistency is critical and the working set fits comfortably in cache memory.

Write-behind (write-back) caching writes to the cache immediately and asynchronously writes to the database later (either after a delay or in batches). Advantages: write latency is reduced to just the cache write, and batching database writes can significantly reduce database load. Disadvantages: if Redis crashes before the async write to the database, data is lost. This pattern is appropriate for write-heavy workloads where temporary data loss is acceptable, like analytics counters or rate limit tracking.

Read-through caching is similar to cache-aside but the cache itself is responsible for loading data from the database on a miss, rather than the application. The application only ever talks to the cache. Advantages: simpler application code with caching logic encapsulated in the cache layer. Disadvantages: requires a cache provider that supports read-through (Redis itself does not natively, but frameworks like Spring Cache and application-level wrappers implement it).

Beyond these patterns, consider cache invalidation strategies. TTL-based expiration is the simplest: set a TTL on every key and accept staleness within the TTL window. Event-driven invalidation uses database change events (like CDC or application-level hooks) to delete or update cache entries when the source data changes, providing fresher data at the cost of complexity. Hybrid approaches use both: TTL as a safety net with event-driven invalidation for timeliness.

The cache stampede problem (also called thundering herd) occurs when a popular key expires and hundreds of concurrent requests simultaneously miss the cache and hit the database. Solutions include locking (only one request fetches from the database while others wait), probabilistic early expiration (each request has a small chance of refreshing the cache before TTL, spreading renewals over time), and background refresh (a separate process refreshes popular keys before they expire).

For a comprehensive caching system design, see the distributed cache with Redis architecture and how Redis works.

Follow-up questions:

  • How do you determine the optimal TTL for a given cache entry?
  • What monitoring would you implement to measure cache effectiveness?
  • How do you handle cache warming after a Redis restart or a new deployment?

5. How would you implement distributed locking with Redis, and what are the pitfalls?

What the interviewer is really asking: Do you understand the subtleties of distributed locking, the Redlock algorithm debate, and the practical failure modes that can break your lock guarantees?

Answer framework:

The simplest Redis lock uses SET with NX (only set if not exists) and EX (expiration): SET lock_key unique_value NX EX 30. This atomically acquires the lock only if it does not exist and sets a 30-second expiration to prevent deadlocks if the lock holder crashes. To release, you must verify the value matches your unique value before deleting, using a Lua script to make the check-and-delete atomic: if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end.

The single-instance lock has a critical weakness: if the Redis master fails after the lock is acquired but before the lock replicates to the replica, a failover promotes the replica, and another client can acquire the same lock. You now have two processes believing they hold the lock.

Redlock, proposed by Salvatore Sanfilippo (Redis creator), attempts to address this by acquiring the lock on N independent Redis instances (typically 5). The lock is considered acquired if the client successfully locks a majority (at least 3 of 5) within a validity time. This tolerates the failure of a minority of instances.

However, Redlock has been heavily criticized by distributed systems researchers, most notably by Martin Kleppmann. The criticism centers on timing assumptions: Redlock assumes that process pauses (garbage collection), network delays, and clock drift are bounded. If a client acquires the Redlock, then experiences a long GC pause that exceeds the lock TTL, the lock expires, another client acquires it, and both clients now proceed concurrently. This violates mutual exclusion.

Kleppmann's recommended alternative is fencing tokens: have the lock service issue a monotonically increasing token with each lock acquisition. The protected resource (typically a database) checks the token and rejects operations with stale tokens. This provides safety even if multiple clients believe they hold the lock.

In practice, for most applications, the single-instance Redis lock with appropriate TTL is sufficient. Use it when the lock protects against duplicate work (idempotent operations where correctness is not compromised by occasional double execution, like sending a notification). For locks that protect critical invariants (like preventing double-spending), use a consensus-based system like ZooKeeper or etcd, which provides true linearizable locks.

Operational pitfalls include setting the TTL too short (the lock expires before the protected work completes, allowing concurrent access), setting the TTL too long (a crashed lock holder blocks all other clients for the duration), and not using unique values for lock release (leading to one client accidentally releasing another client's lock).

Lock extension (renewing the TTL while the work is still in progress) is a common need. Use a background thread that periodically extends the lock. Libraries like Redisson (Java) implement this automatically with a "watchdog" thread.

Follow-up questions:

  • How would you implement a read-write lock using Redis?
  • What happens to your distributed lock if Redis runs out of memory and starts evicting keys?
  • How do you test distributed locking logic to ensure it behaves correctly under failure conditions?

6. Explain Redis data structures beyond strings and when you would use each.

What the interviewer is really asking: Can you leverage Redis's rich data structure library to solve real problems efficiently, rather than treating Redis as a simple key-value store?

Answer framework:

Redis's power comes from its server-side data structures that allow complex operations to be executed atomically in O(1) or O(log N) time, avoiding the network round trips and race conditions of client-side logic.

Lists are doubly-linked lists that support O(1) push and pop from both ends. Use lists for job queues (LPUSH to enqueue, BRPOP to dequeue with blocking), activity feeds (LPUSH new events, LTRIM to cap the list, LRANGE to paginate), and recent items (last N viewed products). Be cautious with large lists: LINDEX and LINSERT are O(N) because they traverse the list.

Sets are unordered collections of unique strings. Use sets for tagging systems (SADD to tag, SMEMBERS to get tags), unique visitor tracking (SADD user_id, SCARD for the count), and relationship modeling (SINTER for mutual friends, SUNION for combined interests). Sets support powerful set operations: intersection, union, and difference, all executed server-side.

Sorted sets (ZSETs) are sets where each member has a floating-point score, maintained in sorted order. This is arguably Redis's most versatile data structure. Use sorted sets for leaderboards (ZADD to update score, ZREVRANGE for top N, ZRANK for a user's position), time-based feeds (score equals timestamp, ZRANGEBYSCORE for time-range queries), rate limiting (score equals request timestamp, ZRANGEBYSCORE to count recent requests, ZREMRANGEBYSCORE to prune old entries), and priority queues (score equals priority). Sorted sets use a skip list internally, providing O(log N) insertion and O(log N + M) range queries where M is the number of returned elements.

Hashes are maps of field-value pairs attached to a single key. Use hashes to represent objects: HSET user:1234 name "Alice" email "alice@example.com" age 30. This is more memory-efficient than storing each field as a separate key because Redis uses a compact ziplist encoding for small hashes (up to hash-max-ziplist-entries fields, default 128). Hashes also allow atomic field-level updates with HINCRBY and HSET without reading the entire object.

Streams (introduced in Redis 5.0) are append-only log data structures with consumer group support, similar to Kafka topics. Use streams for event sourcing, activity logging, and inter-service communication. Consumer groups allow multiple consumers to process stream entries with at-least-once delivery, acknowledgment tracking, and the ability to reclaim entries from failed consumers using XPENDING and XCLAIM.

HyperLogLog provides probabilistic cardinality estimation using only 12KB of memory regardless of the number of unique elements. PFADD adds elements, PFCOUNT returns the approximate count with a standard error of 0.81 percent. Use HyperLogLog for unique visitor counting, unique search query counting, and any scenario where you need approximate distinct counts at massive scale.

Bitmaps allow bit-level operations on strings. Use SETBIT and GETBIT for feature flags per user, daily active user tracking (one bit per user per day), and bloom filter implementations. BITCOUNT and BITOP provide efficient population counts and bitwise operations across multiple bitmaps.

Follow-up questions:

  • How would you implement a sliding window rate limiter using sorted sets?
  • When would you choose a stream over a list for a message queue use case?
  • How does the internal encoding of sorted sets change as they grow, and what is the performance impact?

7. How do you handle Redis memory management and eviction in production?

What the interviewer is really asking: Do you understand how Redis uses memory, what happens when it runs out, and how to configure eviction policies to match your workload without data loss in critical paths?

Answer framework:

Redis stores everything in memory, so memory management is the most critical operational concern. The maxmemory configuration sets the maximum memory Redis will use for data. When this limit is reached, Redis's behavior depends on the maxmemory-policy setting.

The eviction policies are: noeviction (return errors for commands that would use more memory, but continue serving reads and deletes), allkeys-lru (evict the least recently used key from the entire keyspace), allkeys-lfu (evict the least frequently used key), volatile-lru (evict the least recently used key that has a TTL set), volatile-lfu (evict the least frequently used key with a TTL), volatile-ttl (evict the key with the shortest remaining TTL), volatile-random (evict a random key with a TTL), and allkeys-random (evict a random key).

For a pure caching workload where all data is expendable and regenerable, use allkeys-lru or allkeys-lfu. LRU works well when recent access is a good predictor of future access. LFU works better when some keys are consistently popular regardless of recency (hot product pages, popular user profiles). Redis implements approximate LRU by sampling (default 5 keys per eviction cycle) rather than tracking true LRU order, which would require too much memory overhead. Increasing maxmemory-samples improves approximation accuracy at a slight CPU cost.

For mixed workloads where some keys are cache data (expendable) and others are application state (must not be evicted), use volatile-lru or volatile-lfu. Set TTLs on cache keys and do not set TTLs on state keys. Redis will only evict keys with TTLs. However, if all TTL keys are evicted and memory is still full, Redis will return errors for writes rather than evicting non-TTL keys. This is a common misconfiguration: teams set volatile-lru thinking it protects non-TTL keys but forget that if cache keys are insufficient to free memory, writes fail entirely.

Memory fragmentation is an underappreciated concern. Redis allocates and frees memory frequently as keys are created, modified, and evicted. Over time, the allocator (jemalloc by default) may have sufficient total free memory but not in contiguous blocks large enough for new allocations. The mem_fragmentation_ratio in INFO memory reveals this: a ratio significantly above 1.0 indicates fragmentation. Redis 4.0+ includes active defragmentation (activedefrag yes) that reorganizes memory in the background, but it consumes CPU.

Memory optimization techniques include using Redis's compact encodings (small hashes, lists, and sorted sets use ziplist/listpack encoding that is much more memory-efficient), shortening key names in high-volume workloads (user:profile:1234 versus u:p:1234 saves bytes multiplied by millions of keys), using OBJECT ENCODING to verify that keys use expected compact encodings, and periodically running MEMORY DOCTOR and MEMORY USAGE to identify memory-heavy keys.

For deployment sizing, reserve 25-30 percent of available memory for overhead: Redis's own memory usage, fork-based persistence (the child process needs memory for copy-on-write pages), and output buffers for clients and replicas.

Follow-up questions:

  • How would you handle a scenario where Redis eviction causes a cache stampede on your database?
  • What is the memory overhead per key in Redis, and how does it affect your capacity planning?
  • How do you detect and handle memory leaks caused by keys that are created but never expire or get deleted?

8. How would you use Redis for rate limiting in a distributed system?

What the interviewer is really asking: Can you implement rate limiting that is correct across multiple application instances, handles edge cases like window boundaries, and performs under high concurrency?

Answer framework:

Rate limiting is a classic Redis use case because it requires atomic counters with expiration, shared across multiple application instances. There are several algorithms, each with different characteristics.

Fixed window counting is the simplest approach. Use a key like ratelimit:{user_id}:{minute_timestamp} and INCR it for each request. Set EXPIRE on the key for cleanup. Check if the count exceeds the limit. The problem with fixed windows is the boundary effect: a user could send the maximum number of requests at the end of one window and the maximum at the beginning of the next window, effectively doubling their rate over a short period. Implementation in a single atomic operation: use a Lua script that INCRs the counter, sets EXPIRE if the key is new, and returns the current count.

Sliding window log uses a sorted set where each entry is a request with its timestamp as the score. For each request: ZADD the new request, ZREMRANGEBYSCORE to remove requests outside the window, ZCARD to count requests in the window, and check against the limit. This is the most accurate algorithm but uses more memory (one entry per request). Wrap all four commands in a Lua script for atomicity.

Sliding window counter is a memory-efficient approximation. Maintain two fixed window counters (current window and previous window). Estimate the count in the sliding window as: previous_window_count * overlap_percentage + current_window_count. For example, if the window is 60 seconds and you are 15 seconds into the current window, the estimate is: previous_count * 0.75 + current_count. This uses only two keys regardless of request volume.

Token bucket is the most flexible algorithm, allowing burst traffic up to a maximum while enforcing an average rate. Implement with a hash containing two fields: tokens (current count) and last_refill (timestamp). On each request, a Lua script calculates how many tokens to add based on elapsed time (rate * elapsed_seconds, capped at bucket_size), deducts one token, and returns whether the request is allowed. Token bucket naturally handles bursts (accumulated tokens) and smooth rate enforcement.*

For distributed systems, the key consideration is that all application instances must share the same Redis counters. This works seamlessly with a single Redis instance or Redis Cluster (all keys for one user hash to the same slot if you use hash tags). Latency matters: if the rate limit check adds more than a few milliseconds to every request, consider using a local in-memory rate limiter as a first tier (reject obviously over-limit clients immediately) with Redis as the authoritative distributed check.

Edge cases to address: what happens if Redis is unavailable? Failing open (allowing all requests) risks abuse, while failing closed (rejecting all requests) causes an outage. A common approach is to fail open with a degraded local rate limiter and alert on Redis unavailability. Also consider rate limit headers (X-RateLimit-Remaining, X-RateLimit-Reset) in API responses to help clients self-regulate.

For system design context, see the URL shortener design which includes rate limiting as a component.

Follow-up questions:

  • How would you implement tiered rate limits (different limits for different API tiers) in Redis?
  • What happens to your rate limiter during a Redis failover that takes 15 seconds?
  • How would you rate limit by IP address in a system behind multiple load balancers?

9. Explain Redis replication and the consistency guarantees it provides.

What the interviewer is really asking: Do you understand the replication mechanism, the conditions under which data can be lost, and how to configure replication for your durability requirements?

Answer framework:

Redis uses asynchronous leader-follower replication. The master accepts all writes and propagates them to replicas. Replicas are exact copies of the master and serve read requests to scale read throughput.

The replication process begins with a full synchronization. When a replica connects to a master, the master forks and generates an RDB snapshot (full sync). While the snapshot is being generated and transferred, the master buffers all new write commands in the replication backlog (an in-memory circular buffer). After the replica loads the snapshot, the master sends the buffered commands, and then streams new commands in real-time (partial sync).

If a replica temporarily disconnects and reconnects, it attempts partial resynchronization using the replication backlog. The replica tells the master its replication offset (how far it has processed), and if the offset is still within the backlog, the master sends only the missing commands. If the offset has fallen outside the backlog (the backlog was overwritten because the disconnection was too long), a full resynchronization is required. Size the repl-backlog-size based on your write throughput and expected disconnection duration.

The critical consistency implication is that asynchronous replication means acknowledged writes can be lost. If a client writes to the master, receives acknowledgment, and then the master crashes before replicating the write, the promoted replica will not have that write. This is a fundamental trade-off: synchronous replication would guarantee no data loss but at the cost of write latency equal to the round trip to the slowest replica.

Redis provides WAIT command for semi-synchronous replication: WAIT numreplicas timeout blocks the client until the specified number of replicas have acknowledged the write, or the timeout expires. This does not make replication synchronous (the write is still committed on the master first), but it provides a stronger durability guarantee when used correctly. However, WAIT cannot prevent data loss during automatic failover if Sentinel promotes a replica that has not received the latest writes.

Redis Sentinel manages high availability by monitoring masters and replicas, detecting master failure, and promoting a replica. Sentinel uses a quorum of Sentinel instances (typically 3 or 5) to agree on a failure before triggering failover. The min-replicas-to-write and min-replicas-max-lag configurations can prevent the master from accepting writes when too few replicas are connected or replication lag is too high, reducing the window for data loss.

In Redis Cluster, each master has its own replicas, and failover happens per shard. The cluster determines failure through gossip protocol consensus among master nodes. During failover, the cluster may refuse commands for the affected hash slots for a brief period (typically seconds), trading availability for partition tolerance, consistent with the CAP theorem prediction.

Follow-up questions:

  • How do you monitor replication lag, and what would you do if it consistently exceeds your SLA?
  • What happens when a master has multiple replicas and one is significantly behind the others?
  • How does replication interact with Redis persistence during a replica promotion?

10. How would you design a leaderboard system using Redis?

What the interviewer is really asking: Can you apply Redis sorted sets to a real-world use case, handle scale challenges, and address the nuances like tie-breaking, real-time updates, and leaderboard partitioning?

Answer framework:

Redis sorted sets are the canonical data structure for leaderboards because they maintain elements sorted by score with O(log N) insertion and O(log N + M) range retrieval, where M is the number of results returned.

The basic implementation uses ZADD to set or update a player's score, ZREVRANGE (or ZRANGE with REV) to get the top N players, ZREVRANK to get a specific player's rank, and ZSCORE to get a player's score. For a game with 10 million players, ZADD and ZREVRANK are O(log 10M) which is roughly 23 operations in the skip list, completing in microseconds.

Tie-breaking requires thought because sorted sets order members with equal scores lexicographically by member name. If two players have the same score, their relative ordering is arbitrary from a game perspective. Solutions include encoding a secondary sort criterion into the score (for example, score * 1000000 + (MAX_TIMESTAMP - timestamp) to break ties by who achieved the score first), using a combined score format like score.timestamp as a floating-point number (limited by float64 precision), or maintaining a separate sorted set for tie-breaking that is consulted only when scores are equal.*

For real-time leaderboards with millions of updates per second, a single sorted set on a single Redis instance may become a bottleneck. Partitioning strategies include time-based partitioning (separate sorted sets for daily, weekly, and all-time leaderboards), geographic partitioning (separate leaderboards per region, with a global aggregation process), and score-range partitioning (split the leaderboard into score buckets, which complicates rank calculation but distributes load).

Relative leaderboards (show me players ranked near me) use ZREVRANGE with the player's rank as the center: get rank with ZREVRANK, then ZREVRANGE from rank-5 to rank+5. This is O(log N + M) where M is the number of neighbors.

For leaderboards that need to expire entries (for example, weekly leaderboards that reset every Monday), use key naming with the time period: leaderboard:weekly:2026-W16. Create a new key each period and set an expiration on the old one. To display the current leaderboard, always read from the current period's key.

For large-scale leaderboards that exceed single-instance memory, consider using Redis Cluster with hash tags to keep related leaderboard operations on the same node, or use a hybrid approach where Redis holds the top 10,000 players and a database holds the full ranking with materialized views for rank calculation.

This pattern is commonly combined with other Redis structures: use pub/sub or streams to notify clients of score changes in real time, and use hashes to store player metadata (name, avatar) alongside the sorted set.

Follow-up questions:

  • How would you implement a leaderboard that shows percentile rank rather than absolute rank?
  • How do you handle a leaderboard reset that involves clearing a sorted set with millions of entries without blocking Redis?
  • How would you design a leaderboard that aggregates scores across multiple game modes?

11. What is Redis Pub/Sub, and when would you use it versus Redis Streams?

What the interviewer is really asking: Do you understand the fire-and-forget nature of pub/sub, the persistent nature of streams, and the use cases where each is appropriate?

Answer framework:

Redis Pub/Sub is a messaging system where publishers send messages to channels and subscribers receive messages from channels they are subscribed to. The key characteristic is fire-and-forget: messages are delivered to all currently connected subscribers and then discarded. If a subscriber is not connected when a message is published, it misses the message permanently. There is no message persistence, no acknowledgment, and no replay capability.

Pub/Sub is appropriate for real-time notifications where missing a message is acceptable: chat room presence updates, live sports score updates, cache invalidation broadcasts (tell all application instances to invalidate a specific key), and configuration change notifications. The throughput is high (Redis can handle millions of messages per second) and latency is low (sub-millisecond on the server side).

Pub/Sub limitations in Redis Cluster are notable: a PUBLISH on one node broadcasts the message to all nodes in the cluster, which then deliver it to local subscribers. This means pub/sub traffic scales linearly with cluster size, regardless of subscriber distribution. Redis 7.0 introduced sharded pub/sub (SSUBSCRIBE, SPUBLISH) that restricts message delivery to the shard that owns the channel, reducing cross-node traffic.

Redis Streams, introduced in Redis 5.0, are a persistent, log-based messaging system inspired by Apache Kafka. Messages are appended to a stream and retained until explicitly deleted or trimmed. Each message has a unique ID (timestamp-sequence) and contains field-value pairs. Streams support consumer groups: multiple consumers can divide the work of processing a stream, with Redis tracking which messages each consumer has processed and acknowledged.

Streams are appropriate when you need message persistence (consumers can read historical messages), at-least-once delivery (unacknowledged messages can be reclaimed by other consumers), consumer groups (parallel processing with work distribution), and replay capability (a new consumer can read the stream from the beginning).

The consumer group model works as follows: XREADGROUP reads new messages for a consumer within a group. Each message is delivered to exactly one consumer in the group. The consumer processes the message and sends XACK to acknowledge completion. If a consumer crashes without acknowledging, XPENDING lists unacknowledged messages, and XCLAIM reassigns them to another consumer after a timeout.

The trade-off between streams and a dedicated message broker like Kafka is important. Redis Streams are simpler to operate (no separate infrastructure) and lower latency, but Kafka provides stronger durability guarantees (replicated log with configurable retention), higher throughput for sustained high-volume workloads, and more sophisticated consumer group management. For message volumes under 100,000 per second with retention needs of hours to days, Redis Streams are excellent. For higher volumes or longer retention, consider Kafka.

For messaging patterns in system design, see the distributed systems guide.

Follow-up questions:

  • How would you handle a Redis Stream that grows faster than consumers can process it?
  • What happens to pub/sub subscribers during a Redis failover?
  • How do you monitor consumer lag in Redis Streams?

12. How do you handle Redis in a microservices architecture?

What the interviewer is really asking: Can you design a Redis deployment strategy that serves multiple services without creating coupling, operational bottlenecks, or resource contention?

Answer framework:

The first architectural decision is shared versus dedicated Redis instances. A single shared Redis instance is simpler to operate but creates risks: one service's memory-intensive operations can evict another service's data, one service's slow Lua script blocks all services, and a keyspace collision (two services using the same key name) causes subtle bugs. Dedicated Redis instances per service provide isolation but multiply operational overhead.

The pragmatic middle ground is Redis instances per domain or bounded context, not per individual service. Services that share a caching layer for the same data (for example, a product service and a recommendation service both caching product data) can share an instance. Services with unrelated data get separate instances. Use Redis databases (SELECT 0 through SELECT 15) only for development environments, not production, because they share the same event loop and memory limit.

Key naming conventions prevent collisions and enable operational visibility: {service_name}:{entity}:{id}:{field}. For example, order-svc:order:1234:status. Consistent naming allows you to use SCAN with patterns to inspect one service's data without affecting others, and makes it clear in monitoring which service owns which keys.

Connection management is critical in microservices. With dozens of services each running multiple instances, connection counts multiply quickly. A Redis instance with maxclients set to 10,000 could be exceeded by 50 services each running 100 instances with 3 connections each. Use connection pooling in every service (pool size tuned to the service's concurrency level), and monitor connection counts. Consider using a Redis proxy like Twemproxy or Envoy's Redis filter to multiplex connections.

Service discovery and failover: in a containerized environment (Kubernetes), Redis instances might move between nodes. Use Redis Sentinel or Redis Cluster with DNS-based discovery. In AWS, ElastiCache provides managed failover. Ensure that your application's Redis client is configured to handle connection failures gracefully: retry with exponential backoff, use circuit breakers to prevent cascading failures, and fall back to the database when Redis is unavailable.

Data serialization affects performance across services. JSON is human-readable but verbose. MessagePack or Protocol Buffers are more compact and faster to serialize and deserialize. For cache entries that are read far more often than written, the serialization format's deserialization speed matters more than serialization speed. Benchmark with your actual data shapes.

For cross-service coordination patterns using Redis (distributed locks, event broadcasting, rate limiting), ensure that the Redis instance used for coordination has higher availability requirements than one used for pure caching. Treat coordination Redis instances like databases, not caches: use persistence, monitoring, and alerting. See the distributed cache with Redis design for architectural patterns.

Follow-up questions:

  • How do you handle the scenario where a shared Redis instance becomes a single point of failure for multiple services?
  • What is your strategy for migrating a service from a shared Redis instance to a dedicated one?
  • How do you manage Redis configuration drift across multiple instances in a microservices environment?

13. What are Lua scripts in Redis, and when should you use them?

What the interviewer is really asking: Do you understand the atomic execution model of Lua scripts, their performance characteristics, and the scenarios where they provide value over multi-round-trip alternatives?

Answer framework:

Lua scripts in Redis execute atomically on the server side. While a Lua script is running, no other command can execute. This provides transaction-like atomicity without the WATCH/MULTI/EXEC complexity and with the ability to include conditional logic. The script receives keys and arguments, executes Redis commands via redis.call(), and returns results to the client.

The primary use case is atomic operations that require multiple Redis commands with conditional logic. Consider rate limiting: you need to increment a counter, check if it exceeds a limit, and set an expiration if the key is new, all as a single atomic operation. Without Lua, a race condition exists between INCR and EXPIRE: if the process crashes between the two commands, the key never expires and the rate limit is permanent. A Lua script eliminates this race.

Another use case is reducing network round trips. An operation that requires 5 sequential Redis commands (read a value, compute something, write the result, update another key, publish an event) requires 5 network round trips from the client. A Lua script accomplishes the same in a single round trip. For latency-sensitive paths, this reduction is significant: 5 round trips at 0.5ms each is 2.5ms, versus a single round trip of 0.5ms.

Lua scripts have important constraints and best practices. First, they block the entire Redis instance during execution. Keep scripts short (under 5 milliseconds). Long-running scripts cause the same problem as any other O(N) operation: all other clients wait. Use the lua-time-limit configuration (default 5 seconds) as a safety net. If a script exceeds this limit, Redis starts accepting SCRIPT KILL commands from other clients (but only if the script has not yet performed writes).

Second, in Redis Cluster, all keys accessed by a Lua script must be in the same hash slot. Pass all keys as KEYS arguments (not hardcoded in the script) so that Redis can verify slot membership. Third, Lua scripts should be deterministic and side-effect-free (beyond Redis calls) because they may be replicated to replicas by re-execution.

Script management: use SCRIPT LOAD to load a script and receive its SHA1 hash, then call EVALSHA with the hash for subsequent executions. This avoids sending the script text with every call. If a replica or restored node does not have the script cached, EVALSHA fails with NOSCRIPT, and the client should fall back to EVAL with the full script text.

Redis 7.0 introduced Redis Functions as an evolution of Lua scripts. Functions are stored persistently (they survive restarts and are replicated), organized in libraries, and can be loaded once rather than cached per execution. For new projects, prefer Redis Functions over EVAL scripts.

Follow-up questions:

  • How would you debug a Lua script that is producing incorrect results in production?
  • What happens if a Lua script calls a write command and then errors out midway?
  • How do you handle Lua script versioning when deploying changes to a running system?

14. How do you monitor Redis in production, and what metrics matter most?

What the interviewer is really asking: Do you have operational experience with Redis, and do you know which metrics predict problems before they become outages?

Answer framework:

Effective Redis monitoring covers four areas: performance, memory, replication, and client connections. The most important command is INFO, which returns a comprehensive snapshot of Redis's internal state.

Performance metrics: used_cpu_sys and used_cpu_user show CPU consumption. Since Redis is single-threaded, CPU utilization above 70 percent on the Redis process indicates that you are approaching the throughput ceiling. instantaneous_ops_per_sec shows current throughput. Track this over time to establish baselines and detect anomalies. latency monitoring (using LATENCY LATEST, LATENCY HISTORY, and CONFIG SET latency-monitor-threshold 10) captures commands that took longer than a threshold, helping identify slow commands. The slowlog (SLOWLOG GET) records the slowest commands with their execution time, which is invaluable for identifying problematic queries.

Memory metrics: used_memory shows the total memory allocated by Redis. Track this against maxmemory to know how close you are to the eviction threshold. mem_fragmentation_ratio (used_memory_rss / used_memory) reveals memory fragmentation. A ratio significantly above 1.5 indicates fragmentation that wastes memory. A ratio below 1.0 indicates Redis is using swap, which is a critical performance emergency because swapping makes Redis orders of magnitude slower. evicted_keys counts keys evicted due to memory pressure. A non-zero value in a caching workload is normal; in a session store, it indicates under-provisioning.

Replication metrics: master_repl_offset versus the replica's offset shows replication lag in bytes. High or increasing lag indicates that the replica cannot keep up with write throughput, risking data loss during failover. connected_slaves and their state tell you whether your high-availability setup is intact. A master with zero connected slaves has no failover capability.

Client metrics: connected_clients shows current connections. Track against maxclients (default 10,000). blocked_clients shows clients waiting on blocking commands (BRPOP, BLPOP). A high number suggests queue consumers are not keeping up. rejected_connections indicates clients were turned away because maxclients was reached, this is a critical alert.

Keyspace metrics: keyspace_hits and keyspace_misses give you the cache hit ratio (hits / (hits + misses)). A hit ratio below 80 percent for a cache workload suggests either the working set exceeds cache size or TTLs are too aggressive. expired_keys tracks TTL-based expirations.

Set up alerting on: memory usage above 80 percent of maxmemory, replication lag exceeding your SLA, cache hit ratio dropping below baseline, connected_clients approaching maxclients, and CPU utilization above 70 percent. Use tools like Redis Exporter for Prometheus, Datadog Redis integration, or the built-in Redis monitoring commands.

For understanding what drives these metrics at a deeper level, see how Redis works.

Follow-up questions:

  • How would you diagnose a sudden increase in Redis latency that correlates with no change in traffic?
  • What is your approach to capacity planning for Redis, and how far ahead do you plan?
  • How do you handle Redis monitoring in a Redis Cluster with dozens of nodes?

15. When should you NOT use Redis, and what alternatives would you recommend?

What the interviewer is really asking: Do you have the judgment to recognize Redis's limitations and recommend better tools when appropriate, rather than defaulting to Redis for everything?

Answer framework:

Redis is not the right choice when your dataset significantly exceeds available memory. Redis stores everything in RAM, and while techniques like data compression and compact encodings help, a dataset of hundreds of gigabytes or terabytes is impractical in Redis. For large key-value workloads, consider RocksDB-based stores like TiKV, or SSTable-based stores like Cassandra. For caching with datasets larger than memory, consider Memcached with its slab allocator that handles memory more predictably, or see Redis vs Memcached for a detailed comparison.

Redis is a poor choice for complex queries. If you need to filter, join, aggregate, or perform ad-hoc queries on your data, a relational database or a document store like MongoDB will serve you far better. Redis's query capabilities are limited to key lookups, sorted set range queries, and set operations. Attempting to model complex query patterns in Redis leads to excessive denormalization, complex Lua scripts, and maintenance nightmares. Use the right tool: SQL vs NoSQL covers these trade-offs.

Redis is risky as a primary data store for data you cannot afford to lose. Despite persistence options (RDB and AOF), Redis's asynchronous replication means acknowledged writes can be lost during failover. For financial transactions, user-generated content, or any data where loss is unacceptable, use a database with synchronous replication (PostgreSQL with synchronous standby, or a distributed database like CockroachDB). Use Redis as a cache or acceleration layer in front of a durable primary store.

Redis is not ideal for large objects. Storing individual values larger than 1MB causes network latency, memory fragmentation, and blocks other operations during transfer. For blob storage, use object storage (S3) or a blob store with Redis holding only metadata and references.

Redis pub/sub is not a replacement for a real message broker when you need message persistence, delivery guarantees, or consumer group semantics beyond what Redis Streams provides. For high-volume event streaming with retention requirements, use Kafka or Pulsar. Redis Streams fill a middle ground for lower-volume messaging needs.

Redis is not suitable for search workloads. While Redis Search (a module) adds full-text search capabilities, it is not a replacement for Elasticsearch or Solr for complex search requirements with faceting, relevance tuning, and large document collections. Understand the boundaries of your tools. For indexing concepts, see how database indexing works.

The general principle: use Redis for what it excels at (fast, in-memory data structure operations for caching, session management, rate limiting, real-time analytics, and coordination) and pair it with durable storage systems for persistent data. The best architectures use Redis as an accelerator, not a replacement for a database.

Follow-up questions:

  • How would you make the case to your team for removing Redis from a system where it was incorrectly used as a primary store?
  • What is the total cost of running Redis at scale, including operational overhead, and how does it compare to alternatives?
  • How do you evaluate new Redis modules and features to decide whether they are production-ready?

Common Mistakes in Redis Interviews

  1. Treating Redis as just a cache. Redis's data structures (sorted sets, streams, HyperLogLog, geospatial indexes) enable sophisticated use cases far beyond caching. Demonstrating knowledge of only GET, SET, and TTL signals shallow experience.

  2. Ignoring persistence and durability trade-offs. Saying you would store critical data in Redis without discussing persistence configuration, replication lag, and potential data loss shows a lack of production awareness. Always articulate what happens when Redis restarts or fails over.

  3. Not understanding single-threaded implications. Using KEYS * in production, running unbounded LRANGE operations, or writing long-running Lua scripts can block your entire Redis instance. Senior engineers know the time complexity of every command they use.

  4. Over-relying on Redlock for distributed locking. Presenting Redlock as a solved problem without acknowledging its limitations (timing assumptions, GC pauses, clock drift) misses the nuance that interviewers expect at the senior level.

  5. Forgetting about memory. Not monitoring memory usage, not understanding eviction policies, or not planning for the memory overhead of persistence (fork-based RDB snapshots require additional memory for copy-on-write pages) leads to production incidents.*

How to Prepare for Redis Interview Questions

Build practical experience beyond tutorials. Set up a Redis Cluster with Sentinel, configure both RDB and AOF persistence, and observe what happens during failover. Use redis-benchmark to understand the throughput characteristics of different commands and data sizes. Trigger memory pressure and observe eviction behavior. These experiments build the intuition that interviewers test for.

Study Redis internals by reading the source code of key data structures. Understanding that sorted sets use skip lists (not balanced binary trees), that small hashes use ziplist encoding, and that strings under 44 bytes use embedded encoding gives you answers that stand out. The how Redis works deep dive covers these internals.

Practice designing systems that use Redis as a component. Design a rate limiter, a distributed lock, a leaderboard, a session store, and a real-time analytics pipeline. For each, consider what happens when Redis is unavailable, when memory fills up, and when traffic spikes 10x. This systems-thinking approach is what senior interviews test.

Review Redis's release notes for versions 6.0 through 7.2 to understand recent features: I/O threading, Redis Functions, ACLs, client-side caching, and sharded pub/sub. Interviewers at top companies expect you to know current capabilities, not just the Redis 3.x feature set.

For a comprehensive preparation plan, explore the learning paths and the distributed systems guide. Complement your Redis knowledge with broader system design skills using our system design interview guide. Review pricing options to access detailed practice problems and mock interview scenarios.

Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.