INTERVIEW_QUESTIONS

Database Replication Interview Questions for Senior Engineers (2026)

15 advanced database replication interview questions with detailed answer frameworks covering leader-follower replication, multi-leader setups, conflict resolution, replication lag, and production patterns used at Google, Amazon, and other top companies.

20 min readUpdated Apr 21, 2026
interview-questionsdatabase-replicationdistributed-systemssenior-engineerdatabases

Why Database Replication Is Essential Knowledge for Senior Engineers

Database replication is one of the pillars of modern distributed systems. Every production database at scale uses some form of replication, whether for high availability, disaster recovery, read scaling, or geographic distribution. In senior engineering interviews, replication questions reveal whether you understand the fundamental trade-offs between consistency, availability, and performance that govern every distributed data system.

Unlike more theoretical distributed systems topics, replication is deeply practical. Every senior engineer has encountered replication lag, stale reads, failover incidents, or conflict resolution challenges in production. Interviewers expect you to draw on this experience and reason about replication not as an abstract concept but as a real operational concern with measurable consequences.

At companies like Google and Amazon, replication strategies directly impact user experience: a poorly configured replication setup can cause users to see stale data, lose writes during failover, or experience inconsistent behavior across geographic regions. The questions in this guide are designed to probe these real-world dimensions of replication, from the fundamentals of synchronous versus asynchronous replication through to the complexities of multi-leader conflict resolution and leaderless quorum systems.

Each question includes the interviewer's underlying intent, a structured answer framework that demonstrates senior-level depth, and follow-up questions that push into advanced territory. Whether you are preparing for a deep dive on how database replication works or a system design round involving a platform like Amazon's e-commerce service, this guide will strengthen your ability to reason about replication under interview pressure. For a comprehensive preparation approach, see our system design interview guide and explore the learning paths tailored to senior engineers.

1. What is database replication and what are the primary motivations for using it?

What the interviewer is really asking: Can you articulate the distinct reasons for replication and do you understand that each motivation leads to different architectural choices?

Answer framework:

Database replication is the process of maintaining copies of the same data on multiple database servers. While the definition is simple, the motivations for replication are diverse, and each motivation drives different design decisions.

High availability: if the primary database server fails, a replica can take over immediately, minimizing downtime. This is the most common motivation and is relevant for any production system with uptime requirements. The key design question is failover time: how quickly can a replica be promoted to primary, and how much data (if any) is lost during failover?

Read scaling: for read-heavy workloads, replicas can serve read traffic, distributing the load across multiple servers. A single PostgreSQL or MySQL primary might handle 5,000 queries per second, but with 10 replicas, the system can handle 50,000 read queries per second. The key design question is consistency: how stale can reads be, and which queries can tolerate stale data?

Geographic distribution: placing replicas in data centers close to users reduces read latency. A user in Tokyo reading from a local replica experiences 5ms latency versus 200ms reading from a primary in Virginia. The key design question is how writes are handled: must all writes go to a central primary, or can local replicas accept writes (multi-leader)?

Disaster recovery: replicas in a different geographic region protect against catastrophic failures like data center outages, natural disasters, or regional network failures. The key design question is Recovery Point Objective (RPO): how much data can you afford to lose, which determines whether replication must be synchronous.

Understanding these motivations is crucial because they lead to fundamentally different replication architectures. High availability with zero data loss requires synchronous replication. Read scaling typically uses asynchronous replication with eventual consistency. Geographic distribution may require multi-leader replication. Disaster recovery requires cross-region replication with carefully configured RPO and RTO targets.

Follow-up questions:

  • How do you determine the right number of replicas for a given workload?
  • What is the operational cost of maintaining replicas beyond the hardware expense?
  • How does replication interact with database sharding in a system that needs both?

2. Explain the differences between synchronous and asynchronous replication. When would you choose each?

What the interviewer is really asking: Do you understand the fundamental trade-off between data safety and write performance, and can you make the right choice for different business requirements?

Answer framework:

In synchronous replication, a write is not acknowledged to the client until it has been persisted on both the primary and at least one replica. The write path is: client sends write to primary, primary writes locally, primary sends write to replica, replica writes and acknowledges, primary acknowledges to client. This guarantees that the replica always has the latest data, so failover loses zero data. The cost is higher write latency because every write waits for a network round trip to the replica and the replica's disk write. If the replica is in a different region, this adds significant latency.

In asynchronous replication, the primary acknowledges the write to the client as soon as it is persisted locally, then sends the write to replicas in the background. This gives the lowest write latency but creates a replication lag window where the replica is behind the primary. If the primary fails during this window, those writes are lost.

Semi-synchronous replication is a pragmatic middle ground used in production at many companies. One replica is synchronous (guaranteeing at least one copy), and remaining replicas are asynchronous (for read scaling). If the synchronous replica falls behind, another replica is promoted to synchronous. MySQL supports this natively.

The decision framework: use synchronous replication when data loss is unacceptable, such as financial transactions, payment ledgers, or regulatory compliance data. Accept the higher write latency as a cost of safety. Use asynchronous replication when write performance is critical and some data loss during failover is tolerable, such as session stores, activity logs, or content feeds. Use semi-synchronous when you need a balance: zero data loss guarantee with one replica while maintaining reasonable write performance.

Quantify the trade-off: synchronous replication within the same data center adds 1-5ms to write latency. Cross-region synchronous replication adds 50-200ms depending on distance. For a system doing 10,000 writes per second, that additional latency translates directly to higher tail latencies and lower throughput.

Follow-up questions:

  • What happens to write availability when the synchronous replica goes down?
  • How would you implement synchronous replication for a subset of critical tables while using async for others?
  • How does the choice between synchronous and asynchronous affect your RTO and RPO calculations?

3. How does leader-follower (primary-replica) replication work, and what are its failure modes?

What the interviewer is really asking: Can you explain the most common replication topology in detail, including the subtle failure scenarios that cause real production incidents?

Answer framework:

Leader-follower replication is the most widely used topology. A single leader (primary) accepts all writes. Followers (replicas) receive a stream of changes from the leader and apply them in order. Reads can be served by the leader or any follower.

The replication mechanism varies by database. In PostgreSQL, the leader writes changes to the Write-Ahead Log (WAL), and followers receive and replay WAL records. This is called physical (WAL-based) or logical replication. Physical replication sends exact byte-level changes and is faster but ties the replica to the same PostgreSQL version. Logical replication sends logical changes (INSERT row, UPDATE row) and allows different versions, selective table replication, and cross-database-engine replication.

In MySQL, the leader writes changes to the binary log (binlog) in either statement-based (the SQL statement itself), row-based (the actual before/after row values), or mixed format. Row-based is the most reliable because statement-based replication can produce different results if non-deterministic functions are used.

Failure mode one: replication lag. If the follower cannot keep up with the leader's write rate due to slow hardware, expensive queries, or network issues, the lag grows. This means reads from the follower return increasingly stale data. Monitor replication lag continuously and alert when it exceeds a threshold.

Failure mode two: leader failure. When the leader fails, a follower must be promoted. This raises the split-brain problem: if the old leader recovers and still thinks it is the leader, two nodes accept writes simultaneously, causing data divergence. Fencing mechanisms (STONITH: Shoot The Other Node In The Head) ensure the old leader cannot accept writes after failover.

Failure mode three: replication divergence. If a follower applies a transaction differently than the leader due to a bug, non-deterministic function, or configuration difference, the follower's data silently diverges. Detect this with periodic checksum comparisons using tools like pt-table-checksum.

Failure mode four: network partition between leader and followers. Followers stop receiving updates, lag grows unbounded. If reads are served from these followers, users see increasingly stale data. Implement circuit breakers that stop routing reads to followers with excessive lag.

Follow-up questions:

  • How would you design an automated failover system that avoids split-brain?
  • What tools and monitoring would you use to detect replication divergence early?
  • How do you handle the scenario where a follower crashes and needs to be rebuilt from scratch?

4. What is replication lag and how do you mitigate its effects on application behavior?

What the interviewer is really asking: Have you dealt with the real-world consequences of asynchronous replication and can you implement practical solutions at the application level?

Answer framework:

Replication lag is the time delay between when a write is committed on the leader and when it is visible on a follower. In a healthy system with asynchronous replication, lag is typically under 1 second. But under load, during long transactions, or with network issues, lag can grow to seconds, minutes, or even hours.

The user-visible consequence is read-your-writes inconsistency. A user submits a form (write to leader), the page refreshes (read from follower), and their changes are not visible because the follower has not replicated the write yet. This creates confusion and erodes trust in the application.

Mitigation strategy one: read-your-writes consistency. After a write, route that user's subsequent reads to the leader for a configurable window, typically 5 to 10 seconds. Implement this by setting a session cookie with the timestamp of the last write. The routing layer compares this timestamp with each follower's replication position and routes to the leader if no follower is sufficiently caught up.

Mitigation strategy two: monotonic reads. Ensure that a user never sees data go backward. If a user reads from follower A and sees data at timestamp T, subsequent reads must see data at timestamp T or later. Implement this with session affinity: pin each user session to a specific follower so reads within a session always go to the same replica.

Mitigation strategy three: causal consistency. If user A posts a comment and user B loads the page, user B should see the comment if user B's request was caused by user A's action (for example, user A shared a link that user B clicked). Implement this by passing a causal timestamp or version vector between related requests.

Mitigation strategy four: lag-aware routing. Monitor each follower's lag in real-time. The routing layer excludes followers with lag exceeding a threshold from the read pool. If all followers exceed the threshold, route reads to the leader with rate limiting to prevent leader overload.

Mitigation strategy five: application design. For critical workflows where consistency matters (account balance, order status), always read from the leader. For non-critical reads (product catalog, recommendations), accept eventual consistency. This requires classifying each query by its consistency requirements.

Follow-up questions:

  • How do you handle replication lag in a microservices architecture where multiple services read from different replicas?
  • What happens when replication lag exceeds your monitoring threshold and how do you recover?
  • How does replication lag interact with caching layers?

5. How does multi-leader (multi-primary) replication work and when would you use it?

What the interviewer is really asking: Do you understand the complexity and risks of accepting writes at multiple nodes, including the conflict resolution challenges that make multi-leader replication notoriously difficult?

Answer framework:

Multi-leader replication allows writes at multiple nodes, with each leader asynchronously replicating its writes to all other leaders. This topology is used primarily for two scenarios: multi-datacenter operation where each datacenter has a local leader to minimize write latency, and offline-capable applications where clients can write locally and sync later.

In a multi-datacenter setup, each datacenter has a leader that accepts writes from local clients. Users in Europe write to the European leader with low latency, users in Asia write to the Asian leader. Each leader replicates to all other leaders asynchronously. This provides better write latency than single-leader replication where all writes must travel to one datacenter.

The fundamental challenge is write conflicts. If user A updates a record to value X on leader 1 and user B concurrently updates the same record to value Y on leader 2, both writes succeed locally. When the writes replicate to the other leader, there is a conflict. The system must decide which value wins.

Conflict resolution strategies include last-writer-wins (LWW) based on timestamp, where the write with the later timestamp is kept and the other is silently discarded. This is simple but loses data. It is used by Cassandra and DynamoDB by default. Another strategy is merge or concatenate values, which works for some data types like sets (union the values) but not for most. Custom application logic resolution stores both conflicting values and presents them to the user or application for resolution, like Git merge conflicts. CRDTs (Conflict-free Replicated Data Types) are data structures designed so that concurrent operations always converge to the same state without explicit conflict resolution.

Conflict avoidance is usually better than conflict resolution. Partition writes so that all writes to a particular record always go to the same leader. For example, route all writes for a user to the leader in the user's home region. This eliminates conflicts for per-user data but does not help for shared data.

Multi-leader replication is used in practice by systems like CouchDB (designed for offline-first applications), MySQL Group Replication, and PostgreSQL BDR (Bi-Directional Replication). Amazon's DynamoDB uses a form of multi-leader replication in its global tables feature, employing last-writer-wins conflict resolution.

Follow-up questions:

  • How would you implement conflict detection and resolution for a collaborative document editing system?
  • What are the consistency guarantees you can provide with multi-leader replication?
  • How do you test a multi-leader setup to ensure conflict resolution works correctly?

6. What is leaderless replication and how do quorum reads and writes work?

What the interviewer is really asking: Can you explain the Dynamo-style replication model, including the mathematical properties of quorum systems and their practical implications?

Answer framework:

Leaderless replication eliminates the concept of a primary node. Any replica can accept reads and writes. This model was popularized by Amazon's Dynamo paper and is implemented in Cassandra, Riak, and DynamoDB. Understanding the CAP theorem is essential for reasoning about leaderless systems.

In a leaderless system with N replicas, the client sends each write to all N replicas and considers the write successful when W replicas acknowledge. For reads, the client reads from all N replicas and considers the read successful when R replicas respond. The client uses version numbers to determine which response is the most recent.

The quorum condition is W + R > N. When this condition holds, at least one of the R read replicas must have the latest write, guaranteeing that the client sees up-to-date data. Common configurations: N=3, W=2, R=2 provides a balance of consistency and availability. N=3, W=1, R=3 optimizes for fast writes but requires reading all replicas. N=3, W=3, R=1 optimizes for fast reads but requires all replicas to be available for writes.

Even with quorum, consistency is not guaranteed in all edge cases. Sloppy quorums allow writes to nodes that are not the designated replicas during a network partition, using hinted handoff to eventually transfer the data to the correct node. This improves availability but weakens the consistency guarantee. Read repair detects stale values during reads and updates the stale replica, but only works for data that is actively read. Anti-entropy processes run in the background comparing data between replicas and fixing discrepancies.

The practical trade-offs: leaderless replication provides high write availability because there is no single leader to fail. However, it complicates read logic because the client must handle responses from multiple replicas, manage version conflicts, and perform read repair. It also uses more network bandwidth because every operation touches multiple nodes.

Leaderless replication is well-suited for workloads where high write availability is critical, data can be modeled as independent key-value pairs, and eventual consistency is acceptable. It is less suitable for workloads requiring strong consistency, transactions, or complex queries. For applications needing both high availability and stronger consistency, consider a SQL vs NoSQL analysis to pick the right storage engine.

Follow-up questions:

  • What happens when a quorum cannot be achieved because too many replicas are down?
  • How do sloppy quorums interact with the consistency guarantees of the system?
  • How would you implement a distributed counter using leaderless replication without losing increments?

7. How do you design and execute a database failover strategy?

What the interviewer is really asking: Have you managed failovers in production? Do you understand the automation, monitoring, and safeguards needed to failover without data loss or extended downtime?

Answer framework:

Database failover is the process of promoting a replica to primary when the current primary fails. Despite sounding straightforward, failover is one of the most incident-prone operations in database management.

Automatic versus manual failover: automatic failover reduces downtime because it does not wait for a human. However, false positives, where the monitoring system incorrectly declares the primary dead, can trigger unnecessary failovers that cause more problems than they solve. A conservative approach uses automatic detection with manual confirmation for non-emergency cases, and fully automatic failover only for clear-cut failures like the primary process crashing.

The failover process involves several steps. First, failure detection: monitor the primary with health checks from multiple vantage points. A single failed health check could be a network blip. Require multiple consecutive failures from multiple monitors before declaring the primary dead. Use a consensus mechanism among monitoring nodes to avoid split-brain in the monitoring layer itself.

Second, replica selection: choose the replica with the least replication lag to minimize data loss. In PostgreSQL, compare the WAL position of each replica to the last known position of the primary. The replica closest to the primary's position has the most data.

Third, promotion: promote the selected replica to primary. This involves stopping replication, making the replica writable, and updating the connection endpoint (DNS, VIP, or proxy configuration) to point to the new primary. The promotion itself takes seconds, but DNS propagation can take minutes. Use a proxy like PgBouncer or HAProxy that can switch backends instantly.

Fourth, fencing: ensure the old primary cannot accept writes. If the old primary recovers, it must not think it is still the primary. Use STONITH (power off the old primary), revoke its write permissions, or use a distributed lock that the old primary no longer holds.

Fifth, client reconnection: applications must reconnect to the new primary. Connection pools detect closed connections and reestablish them, but this causes a brief period of errors. Design applications with retry logic and circuit breakers to handle this gracefully.

Data loss during failover depends on the replication mode. With synchronous replication, the promoted replica has all committed data. With asynchronous replication, any data in the leader's WAL that had not been replicated is lost. Quantify this risk: if average replication lag is 100ms and write throughput is 1,000 writes per second, approximately 100 writes could be lost in a failover.

Follow-up questions:

  • How do you handle failover when the primary is not dead but unreachable due to a network partition?
  • What testing strategy do you use to validate your failover process?
  • How do you reintegrate the old primary after failover without data loss?

8. What are the challenges of cross-region database replication?

What the interviewer is really asking: Can you reason about the physics-level constraints of wide-area networking and design replication strategies that work across continents?

Answer framework:

Cross-region replication introduces challenges that do not exist within a single data center, primarily driven by the speed of light. Light takes approximately 67ms to travel between US East Coast and US West Coast through fiber. A round trip is 134ms. Cross-Atlantic latency is 70-100ms one way. These numbers are physical constraints that cannot be reduced.

For synchronous cross-region replication, every write incurs this latency penalty. A system that handles 5,000 writes per second within a single region might drop to 500 writes per second with synchronous cross-region replication because each write blocks for 100+ ms. This is why most cross-region replication is asynchronous, with the understanding that a regional failure may lose the most recent writes that had not yet replicated.

Bandwidth is another constraint. Cross-region network links have limited bandwidth and higher costs than intra-region links. The replication stream must be efficient: send only the changes (logical replication), compress the stream, and batch small writes together.

Conflict resolution becomes mandatory in multi-region active-active setups where both regions accept writes. As discussed in the multi-leader question, you must choose between last-writer-wins (simple, lossy), custom resolution logic (complex, business-specific), or conflict avoidance through geographic partitioning.

Regulatory compliance adds another dimension. GDPR, data residency laws, and industry regulations may require that certain data never leaves a specific region. Your replication strategy must respect these constraints, potentially using partial replication where only non-sensitive data is replicated globally while PII stays in the home region.

For systems that need strong cross-region consistency, Google Spanner's TrueTime approach uses GPS-synchronized atomic clocks to provide externally consistent reads and writes across regions. This is the gold standard but requires specialized hardware. CockroachDB implements a similar model using hybrid logical clocks without specialized hardware, though with slightly weaker guarantees.

Design patterns for cross-region replication include active-passive where one region handles all writes and replicates to the other for disaster recovery; active-active where both regions accept writes with conflict resolution; follow-the-sun where the write leader migrates based on which region has the most traffic at a given time.

Follow-up questions:

  • How would you design a system that provides read-your-writes consistency for a globally distributed user base?
  • What is your strategy for testing cross-region failover?
  • How do you monitor cross-region replication lag and what are your alerting thresholds?

9. How does replication work in a sharded database, and what are the compounding complexities?

What the interviewer is really asking: Can you reason about the interaction between two fundamental scaling strategies, sharding and replication, and understand how their complexities multiply?

Answer framework:

In a sharded database, each shard is itself a replicated unit. If you have 10 shards each with 3 replicas (1 leader + 2 followers), you are managing 30 database instances. This multiplicative complexity is the primary operational challenge.

Topology: each shard runs its own independent leader-follower replication. Shard 1's leader replicates to shard 1's followers. Shard 2's leader replicates to shard 2's followers. There is no replication between different shards because they hold different data. The routing layer must know both the shard assignment (which shard holds the data) and the replica topology (which node is the leader for that shard).

Failover complexity: each shard can fail independently. Shard 3's leader might fail while all other shards are healthy. The failover system must handle per-shard failover, promoting shard 3's most up-to-date follower without affecting other shards. But if the cause is a correlated failure like a rack power outage, multiple shard leaders might fail simultaneously, requiring coordinated failover.

Replication lag monitoring: you must track replication lag per shard. A query router that considers lag must know the lag of each follower of each shard. If shard 5's followers are all lagging, reads for data on shard 5 should go to the leader while reads for other shards can go to followers.

Cross-shard consistency: if a transaction spans shards 2 and 7, and both shards use asynchronous replication, a reader on the followers might see the shard 2 portion of the transaction but not the shard 7 portion, creating a temporarily inconsistent view. Mitigating this requires either reading from leaders for cross-shard queries or using a global transaction timestamp that ensures all replicas have caught up before serving the read.

Schema migrations multiply: a migration must be applied to all replicas of all shards. With 10 shards and 3 replicas each, that is 30 migration executions. A failure on any one must be detected and handled. Rolling migrations, where you migrate one shard at a time, are safer but take 10 times as long.

Capacity planning becomes multi-dimensional: you must plan for both horizontal growth (adding shards for more data) and replication growth (adding followers for more read capacity). These interact: adding a follower to every shard multiplies the total node count by the number of shards.

Follow-up questions:

  • How would you automate failover for a sharded-replicated database with 100 shards?
  • How do you handle the case where resharding requires changing the replication topology?
  • What is your monitoring strategy for a system with hundreds of database instances across shards and replicas?

10. What is change data capture (CDC) and how does it relate to database replication?

What the interviewer is really asking: Do you understand how the replication log can be used as a building block for broader system architecture, not just for maintaining replicas?

Answer framework:

Change data capture is the process of identifying and capturing changes made to a database and delivering those changes to downstream systems in real-time. CDC leverages the same replication log that database replication uses: the WAL in PostgreSQL, the binlog in MySQL, the oplog in MongoDB.

The key insight is that the replication log is an ordered stream of every change to the database. By reading this stream, you can replicate data to systems other than database replicas: search indexes, caches, data warehouses, message queues, and analytics pipelines. This makes CDC a foundational pattern for distributed systems.

Log-based CDC reads the database's replication log directly. Tools like Debezium, Maxwell, and AWS DMS connect to the database as a replication follower and stream changes to Kafka or other message brokers. The advantages are low overhead on the source database (it is already writing the log), complete capture of all changes (nothing is missed), and preservation of transaction boundaries and ordering.

Trigger-based CDC uses database triggers to capture changes and write them to a change table. This is database-agnostic but adds write amplification to every transaction and can miss changes made outside the application such as manual fixes or migration scripts.

Query-based CDC polls the database for changes using a timestamp or sequence column. This is the simplest to implement but can miss deletes (the row is gone), has polling latency, and puts load on the database.

CDC enables powerful architectural patterns. Event sourcing: use the change stream as the definitive event log. CQRS: stream changes from the write database to read-optimized projections. Cache invalidation: invalidate or update cache entries based on database changes rather than TTL-based expiration. Real-time analytics: stream changes to a data warehouse for near-real-time business intelligence. Microservice data synchronization: propagate data changes across service boundaries without tight coupling.

Practical considerations include schema evolution: when the source table schema changes, CDC consumers must handle the new format. Debezium includes schema information in each change event. Initial snapshot: for a new CDC consumer, you need the full current state of the table, not just future changes. Most CDC tools support an initial snapshot followed by continuous streaming. Exactly-once delivery: most CDC pipelines provide at-least-once delivery, meaning consumers must be idempotent.

Follow-up questions:

  • How would you use CDC to keep a search index in sync with a database without any lag?
  • What happens when the CDC pipeline falls behind and the database's replication log is truncated?
  • How do you handle schema changes in the source database when CDC consumers expect a specific format?

11. How do you handle replication in a microservices architecture?

What the interviewer is really asking: Can you reason about data replication beyond a single database, addressing the challenges of data consistency across service boundaries?

Answer framework:

In a microservices architecture, each service owns its data and exposes it through APIs. Direct database replication between services violates service boundaries and creates tight coupling. But services often need access to each other's data for their operations. This tension requires careful architectural patterns.

The event-driven approach: services publish events when their data changes. Other services subscribe to these events and maintain local copies of the data they need. For example, the Order Service publishes an OrderPlaced event, and the Shipping Service consumes it to create a shipment record. This is essentially application-level replication where the event bus (Kafka, RabbitMQ) serves the same role as the database replication log.

The API composition approach: services query other services' APIs at read time instead of maintaining local copies. This avoids data duplication but introduces runtime dependencies, latency, and availability concerns. If the Order Service is down, the Shipping Service cannot look up order details.

The shared data platform approach: use a data lake or data warehouse that aggregates data from all services via CDC. Analytical queries and cross-service reports run against this shared platform rather than querying individual service databases. This preserves service autonomy for operational data while enabling cross-service analytics.

Consistency challenges: with event-driven replication, there is always a window where the source service has committed a change but consumers have not yet processed the event. This is analogous to replication lag in database replication. The same mitigation strategies apply: read-your-writes by querying the source service after a write, eventual consistency acceptance for non-critical reads, and causal ordering using event metadata.

The saga pattern coordinates multi-service transactions without distributed locking. Each service executes its local transaction and publishes an event. If a later step fails, compensating transactions undo the earlier steps. This is the distributed equivalent of a database rollback but requires explicit compensation logic in each service.

Choosing between these patterns depends on the use case. For operational data needed at low latency, event-driven replication with local copies is best. For occasional lookups, API composition is simpler. For analytical workloads, the shared data platform scales better.

Follow-up questions:

  • How do you handle the ordering of events across multiple services?
  • What happens when an event consumer fails and events pile up?
  • How do you detect and resolve data drift between services that maintain copies of the same data?

12. What is the split-brain problem in database replication and how do you prevent it?

What the interviewer is really asking: Do you understand one of the most dangerous failure modes in replicated systems and can you implement safeguards against it?

Answer framework:

Split-brain occurs when two nodes in a replicated system both believe they are the primary and accept writes simultaneously. This creates divergent data that is extremely difficult to reconcile. Split-brain is the most feared failure mode in database replication because it can cause data corruption that takes days to identify and fix.

Split-brain typically occurs during network partitions. The primary and its replicas lose network connectivity. The replicas, unable to reach the primary, conclude that the primary has failed and promote one of their own. Meanwhile, the original primary is still running and accepting writes from clients on its side of the partition. Now two primaries exist.

Prevention strategy one: fencing with STONITH (Shoot The Other Node In The Head). When promoting a new primary, the failover system must guarantee that the old primary cannot accept writes. This is typically done by powering off the old primary's machine via an out-of-band management interface (IPMI, iLO, cloud API). Only after confirming the old primary is dead does the new primary start accepting writes.

Prevention strategy two: quorum-based leadership. Use a consensus protocol like Raft or Paxos among the database nodes. A node can only be the primary if it holds the leadership lease, which requires a majority vote. In a network partition, only the side with the majority of nodes can elect a leader. The minority side cannot form a quorum and stops accepting writes. Systems like CockroachDB and etcd use this approach natively.

Prevention strategy three: external distributed lock. Use a distributed coordination service like ZooKeeper or etcd to manage a leadership lock. The primary must continuously renew its lease. If the primary loses connectivity to the lock service, it must stop accepting writes even if it is still running. The new primary acquires the lock only after the old primary's lease expires.

Prevention strategy four: witness or arbitrator node. Deploy a lightweight witness node in a third location. During a network partition, the side that can reach the witness maintains the primary. The side that cannot reach the witness steps down. This avoids the cost of running full replicas in a third location.

Detection: even with prevention measures, monitor for split-brain by comparing write sequences across nodes. If two nodes report writes that are not in each other's replication streams, split-brain may have occurred. Alert immediately for manual investigation. Understanding CAP theorem trade-offs is fundamental to reasoning about these scenarios.

Follow-up questions:

  • How would you recover from a split-brain scenario if it occurs despite your safeguards?
  • What are the trade-offs between STONITH and quorum-based approaches?
  • How does split-brain prevention work in a multi-region deployment where network partitions are more common?

13. How does replication work in PostgreSQL versus MySQL, and what are the practical differences?

What the interviewer is really asking: Do you have hands-on experience with real database systems and understand how theoretical replication concepts map to specific implementations?

Answer framework:

PostgreSQL and MySQL both support leader-follower replication but with significant implementation differences that affect production operations. A thorough comparison is available in our PostgreSQL vs MySQL analysis.

PostgreSQL replication: PostgreSQL offers two replication modes. Streaming replication (physical) sends WAL records from the primary to replicas. Replicas are exact byte-for-byte copies and cannot have different schemas or selective table replication. It is simple to set up and efficient because WAL records are already being generated. Logical replication sends logical changes (row-level inserts, updates, deletes) and supports selective table replication, different indexes on replicas, cross-version replication, and publishing to non-PostgreSQL subscribers. Logical replication was introduced in PostgreSQL 10 and has matured significantly.

PostgreSQL supports synchronous replication via the synchronous_standby_names parameter. You can configure synchronous commit at the transaction level, allowing critical transactions to wait for replica acknowledgment while less critical ones use asynchronous commit. PostgreSQL also supports cascading replication where a replica can feed other replicas, reducing the load on the primary.

MySQL replication: MySQL uses the binary log (binlog) as its replication stream. Three binlog formats are available: statement-based (logs the SQL), row-based (logs the row changes), and mixed (uses statement-based by default, switches to row-based for non-deterministic statements). Row-based is the most reliable for replication.

MySQL offers two replication mechanisms. Traditional replication uses binlog position tracking, where the replica tracks its position in the primary's binlog file. This is fragile across failovers because binlog positions differ between servers. GTID-based replication (Global Transaction Identifiers) assigns a unique ID to each transaction, making failover and replica reconfiguration much simpler because any replica can replicate from any other replica by specifying the GTID set it has already applied.

MySQL Group Replication provides multi-leader replication with built-in conflict detection using a certification-based approach. Transactions are applied locally and then sent to the group for certification. If a conflict is detected, the transaction is rolled back. This is MySQL's answer to Galera Cluster.

Practical differences: PostgreSQL's logical replication is more flexible for selective replication and cross-version upgrades. MySQL's GTID replication makes failover simpler. PostgreSQL's physical replication creates exact copies which simplifies reasoning about consistency. MySQL's Group Replication provides built-in multi-leader support that PostgreSQL lacks natively.

Follow-up questions:

  • When would you choose PostgreSQL's logical replication over streaming replication?
  • How do you handle a MySQL replica that falls behind and the binlog files have been purged?
  • What are the performance implications of enabling synchronous replication in PostgreSQL?

14. How do you design replication for a globally distributed e-commerce platform?

What the interviewer is really asking: Can you apply replication concepts to a complex, real-world system with diverse data types, consistency requirements, and geographic constraints?

Answer framework:

A globally distributed e-commerce platform like Amazon has multiple data domains, each with different replication requirements.

Product catalog: read-heavy, rarely updated, tolerates staleness of seconds to minutes. Use asynchronous leader-follower replication with followers in every region. Cache aggressively using a CDN and application cache. Updates propagate via replication and cache invalidation. This is the simplest data domain.

User accounts and authentication: strong consistency required for security-sensitive operations like password changes and two-factor authentication. Use synchronous replication to at least one follower. For login operations, read from the leader to ensure the latest credentials are used. For profile display, reading from a follower with bounded lag is acceptable.

Inventory: the most consistency-critical domain. Overselling (selling more units than available) has direct financial consequences. For each product, designate a single authoritative shard that handles inventory decrements. Other regions read inventory counts from replicas for display but route purchase transactions to the authoritative shard. Accept the latency cost of cross-region writes for inventory because correctness is non-negotiable.

Order data: once an order is placed, it must never be lost. Use synchronous replication within the region where the order was placed, plus asynchronous cross-region replication for disaster recovery. The order service should confirm the order only after synchronous replication succeeds.

Shopping cart: high write frequency, tolerates eventual consistency. Use multi-leader replication so each region can accept cart modifications locally. If a user modifies their cart in two regions simultaneously, use last-writer-wins or merge the cart contents. The cost of losing a cart item is low compared to the benefit of low-latency cart operations.

Search index: search data is derived from the product catalog via CDC. Replicate the product catalog changes to each region's search service. Search can tolerate seconds of lag since newly added products appearing a few seconds late is acceptable.

Payment processing: never replicate raw payment data across regions due to PCI compliance. Process payments in the region where the transaction originates. Replicate only non-sensitive transaction metadata for analytics.

The overall architecture uses a mix of replication strategies tailored to each domain's consistency and latency requirements, demonstrating that real systems are never one-size-fits-all.

Follow-up questions:

  • How would you handle a flash sale that temporarily overwhelms the inventory service's replication capacity?
  • What is your disaster recovery plan if an entire region goes offline during peak traffic?
  • How do you ensure that order and payment data stay consistent across regions?

15. What are CRDTs and how do they relate to database replication and conflict resolution?

What the interviewer is really asking: Do you understand advanced distributed data structures that enable conflict-free replication, and can you evaluate when they are appropriate versus simpler approaches?

Answer framework:

CRDTs (Conflict-free Replicated Data Types) are data structures that can be replicated across multiple nodes, updated independently and concurrently at each node, and guaranteed to converge to the same state without any conflict resolution protocol. They achieve this by restricting the operations to those that are commutative, associative, and idempotent, meaning the order in which operations are applied does not affect the final result.

There are two categories. State-based CRDTs (convergent): each node maintains the full state. Nodes periodically send their state to other nodes. A merge function combines two states into one. The merge function must be commutative, associative, and idempotent. Example: a G-Counter (grow-only counter) where each node maintains its own count. The merge function takes the maximum of each node's count. The total counter value is the sum across all nodes.

Operation-based CRDTs (commutative): nodes send operations (not full state) to each other. The operations must be commutative so they can be applied in any order. This uses less bandwidth than state-based but requires reliable delivery of operations.

Common CRDT types include G-Counter (increment only), PN-Counter (increment and decrement using two G-Counters), G-Set (add-only set), OR-Set (observed-remove set that supports adds and removes), LWW-Register (last-writer-wins single value), and LWW-Element-Set (last-writer-wins set). More complex CRDTs exist for text editing (YATA, RGA) and JSON documents.

Practical applications include Redis CRDT module for active-active Redis deployments. Riak uses CRDTs for its data types (counters, sets, maps). Apple uses CRDTs in the Notes app for cross-device synchronization. Figma uses CRDTs for its real-time collaborative design tool.

Limitations of CRDTs include monotonically growing metadata where tombstones (deletion markers) accumulate and must be garbage collected. Limited expressiveness means not all data structures can be expressed as CRDTs. The semantics can be surprising: an OR-Set's behavior during concurrent add and remove of the same element may not match user expectations. Performance overhead from metadata can be significant for fine-grained data like character-by-character text editing.

When to use CRDTs versus other approaches: use CRDTs when you need multi-leader or leaderless replication with automatic conflict resolution and the data type fits a CRDT model. Use last-writer-wins when conflicts are rare and occasional data loss is acceptable. Use application-level conflict resolution when the conflict semantics are business-specific and cannot be captured by a generic data structure.

Follow-up questions:

  • How would you implement a CRDT-based shopping cart for a multi-region e-commerce platform?
  • What is the memory overhead of CRDTs compared to plain data structures?
  • How do you garbage collect tombstones in an OR-Set without breaking convergence guarantees?

Common Mistakes in Database Replication Interviews

  1. Conflating replication with sharding. Replication copies the same data to multiple nodes for availability and read scaling. Sharding splits different data across nodes for write scaling and storage capacity. They are complementary, not interchangeable. Always clarify which you are discussing.

  2. Assuming replication guarantees consistency. Asynchronous replication means replicas can serve stale data. Many candidates describe adding replicas for read scaling without addressing the consistency implications. Always discuss what consistency model your application requires and how replication lag affects it.

  3. Ignoring failover complexity. Candidates often say "if the primary fails, we promote a replica" without discussing split-brain prevention, data loss quantification, client reconnection, or the operational runbook. Failover is the most critical and error-prone operation in a replicated system.

  4. Not considering replication lag as a first-class metric. Replication lag is not just a database metric; it directly affects user experience. Failing to discuss monitoring, alerting, and application-level mitigation strategies suggests a lack of production experience.

  5. Overlooking operational complexity. Every replica is another database instance that needs monitoring, maintenance, capacity planning, and security patching. Candidates who propose 10 replicas without discussing the operational burden are not demonstrating senior-level thinking.

How to Prepare for Database Replication Interviews

Build hands-on experience first. Set up leader-follower replication in both PostgreSQL and MySQL on your local machine. Intentionally kill the primary and practice failover. Observe replication lag under load. Break replication and fix it. This hands-on experience is more valuable than any amount of reading because it builds intuition for how replication behaves in practice.

Study the theoretical foundations. Read the relevant chapters of Martin Kleppmann's Designing Data-Intensive Applications, which covers replication in exceptional depth. Understand the CAP theorem, consistency models (linearizability, sequential consistency, causal consistency, eventual consistency), and consensus protocols (Raft, Paxos). Our guide on distributed systems provides a structured introduction.

Review real-world case studies. Read engineering blog posts from Google and Amazon about their replication strategies. Study the DynamoDB paper for leaderless replication, the Spanner paper for globally consistent replication, and the Raft paper for consensus-based replication.

Practice applying replication to system design problems. When designing a system, explicitly state the replication strategy for each data store, justify your choices based on the access patterns and consistency requirements, and discuss the operational implications. Our system design interview guide provides a framework for structuring these discussions.

Finally, prepare to discuss replication trade-offs quantitatively. Know the latency numbers: intra-datacenter replication takes 0.5-2ms, cross-region replication takes 50-200ms. Know the data loss implications: with 100ms of replication lag and 1,000 writes per second, failover can lose up to 100 writes. These numbers make your answers concrete and credible. Explore the learning paths and check out our pricing for structured preparation.

Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.