Courses 0%
35
Additional Database Concepts · Chapter 35 of 42

Database Replication

Akhil
Akhil Sharma
20 min

Database Replication

Copying data across multiple database nodes for availability and read scaling — and the consistency trade-offs that come with every replication strategy.

Database Replication: Keeping Multiple Copies in Sync (Your Data's Backup Dancers) 🎯 Challenge 1: The Important Document Problem Imagine this scenario: You have a critically important document - your company's only copy of customer contracts.

Single Copy (No Replication):

yaml

With Replication (Multiple Copies):

yaml

Pause and think: What if every change to your database was automatically copied to multiple servers around the world?

The Answer: Database Replication creates and maintains multiple copies of your data! It's like: ✅ Primary server handles writes (source of truth) ✅ Replica servers copy all changes (automatic sync) ✅ Read from any replica (distribute load) ✅ Failover to replica if primary fails (high availability) ✅ Replicas can be in different locations (disaster recovery)

Key Insight: Replication trades storage space and complexity for availability, performance, and durability!

🎬 Interactive Exercise: Replication vs Backup

You might think: "Isn't replication just fancy backup?"

The Key Differences:

Backup (Snapshot):

yaml

Replication (Real-time):

yaml

The Comparison:

Real-world parallel: Backup is like photos in a safe (for long-term recovery). Replication is like having an understudy in a play (immediate continuity).

🏗️ Types of Replication

Type 1: Primary-Replica (Master-Slave)

yaml

Type 2: Primary-Primary (Multi-Master)

yaml

Type 3: Chain Replication

yaml

Real-world parallel:

  • Primary-Replica = Teacher (primary) and students taking notes (replicas)
  • Primary-Primary = Two teachers co-teaching (both can write on board)
  • Chain = Assembly line (item passes through each station)

🎮 Decision Game: Replication Conflict!

Context: You have Primary-Primary replication. What happens with concurrent writes?

Scenario:

Options for Conflict Resolution:

  1. Last Write Wins (timestamp-based)
  2. First Write Wins (sequence-based)
  3. Merge Values (application logic)
  4. Reject Second Write (pessimistic locking)

Think about: What's safest for a bank account?

Analysis:

The Golden Rule:

yaml

Real-world parallel: Multi-primary is like two people editing the same Google Doc. Works great until both edit the same sentence simultaneously (conflict!).

⚡ Replication Methods: How Data Gets Copied

Method 1: Statement-Based Replication

sql

Method 2: Row-Based Replication (Logical)

sql

Method 3: Physical Replication (Binary)

yaml

PostgreSQL Example:

sql

Real-world parallel:

  • Statement-based = Telling someone "make a sandwich" (might make different sandwich)
  • Row-based = Sending exact sandwich recipe with amounts (reproducible)
  • Physical = Sending exact 3D-printed sandwich (identical copy)

🚨 Common Misconception: "Replicas Are Always Consistent... Right?"

You might think: "Replicas have exact same data as primary."

The Reality: There's always replication lag!

Understanding Replication Lag:

Real-World Scenario:

Solutions:

Solution 1: Read-After-Write Consistency

python

Solution 2: Monotonic Reads

python

Solution 3: Synchronous Replication

sql

Monitoring Replication Lag:

sql

Real-world parallel: Replication lag is like news propagation. Breaking news happens (primary), but it takes time for newspapers (replicas) to print and distribute.

🔧 Failover: When Primary Dies

Automatic Failover Flow:

Failover Tools:

PostgreSQL with Patroni:

yaml

MySQL with Orchestrator:

sql

Application-Level Failover:

python

Real-world parallel: Failover is like vice president becoming president. There's a clear succession plan, happens automatically, minimal disruption.

💡 Multi-Region Replication

Global Setup:

Multi-Primary Multi-Region:

Real-world parallel: Multi-region replication is like having offices in multiple countries. Can serve customers locally (fast), but coordinating between offices takes time.

💡 Final Synthesis Challenge: The Backup Dancers

Complete this comparison: "A database without replication is like a solo performer. A replicated database is like..."

Your answer should include:

  • High availability
  • Read scalability
  • Disaster recovery
  • Trade-offs (lag, complexity)

Take a moment to formulate your complete answer...

The Complete Picture: A replicated database is like a performer with backup dancers:

Primary (lead dancer): Handles all choreography changes (writes) ✅ Replicas (backup dancers): Learn and perform the choreography (reads) ✅ If lead falls: Backup steps up immediately (automatic failover) ✅ Multiple performers: Show can continue in multiple cities (scalability) ✅ Slight delay: Backups learn choreography with small lag (replication lag) ✅ Global reach: Perform simultaneously worldwide (multi-region) ✅ Disaster recovery: Show goes on even if one venue closes

Replication provides:

  1. High Availability - System stays up when primary fails
  2. Read Scalability - Distribute read load across replicas
  3. Disaster Recovery - Data survives datacenter failures
  4. Low Latency - Serve users from nearest replica

Trade-offs:

  • Replication lag (eventual consistency)
  • More complex setup and monitoring
  • Storage cost (multiple copies)
  • Network bandwidth for replication

Real-world examples:

  • Netflix: Read replicas in every region for low latency
  • GitHub: Replicas for high availability
  • Instagram: Replicas for read scaling
  • Stripe: Synchronous replication for financial data

Replication transforms fragile single-server databases into resilient, globally distributed systems!

🎯 Quick Recap: Test Your Understanding Without looking back, can you explain:

  1. What's the difference between replication and backup?
  2. How does Primary-Replica replication work?
  3. What is replication lag and why does it happen?
  4. When should you use synchronous vs asynchronous replication?

Mental check: If you can design a replication strategy, you understand database replication!

🚀 Your Next Learning Adventure Now that you understand replication, explore:

Advanced Replication:

  • Semi-synchronous replication
  • Group replication (MySQL)
  • Logical replication and CDC
  • Cross-datacenter replication

Replication Tools:

  • Patroni (PostgreSQL HA)
  • MySQL InnoDB Cluster
  • MongoDB Replica Sets
  • Redis Sentinel

Related Concepts:

  • Database sharding
  • Distributed consensus (Raft, Paxos)
  • Multi-region architectures
  • Read-write splitting

Real-World Case Studies:

  • How GitHub handles database failover
  • Netflix's multi-region database strategy
  • Stripe's approach to financial data replication
  • Facebook's MySQL replication at scale

Key Takeaways

  1. Replication copies data across multiple nodes for availability and read scaling — if the primary fails, a replica takes over
  2. Synchronous replication guarantees consistency but adds latency — the primary waits for replicas to acknowledge before confirming writes
  3. Asynchronous replication is faster but risks data loss — the primary doesn't wait, so recent writes may be lost if it crashes
  4. Replication lag causes stale reads from replicas — read-after-write consistency requires routing reads to the primary after writes
  5. Leader election handles primary failover — but split-brain scenarios can occur if two nodes both think they're the leader
Chapter complete!

Course Complete!

You've finished all 42 chapters of

System Design Indermediate

Browse courses
Up next Distributed Databases
Continue