MVCC Explained: Multi-Version Concurrency Control in Databases
How MVCC lets databases handle concurrent reads and writes without locking — version chains, snapshot isolation, vacuum, and performance implications.
Multi-Version Concurrency Control (MVCC)
MVCC is a concurrency control method where the database maintains multiple versions of each row, allowing readers to see a consistent snapshot without blocking writers, and writers to proceed without blocking readers.
What It Really Means
Without MVCC, databases face a fundamental conflict: readers and writers need access to the same data simultaneously. The simplest solution is locking — a writer locks the row, blocking all readers until the write completes. This works but destroys performance under concurrent workloads.
MVCC eliminates this conflict by keeping old versions of rows alongside new ones. When a transaction updates a row, the database does not overwrite the old version. Instead, it creates a new version. Concurrent readers continue seeing the old version (the one that was committed when their transaction started), while the writer works on the new version. Readers never block writers. Writers never block readers.
PostgreSQL, MySQL InnoDB, Oracle, and SQL Server all implement MVCC, though each does it differently. PostgreSQL stores old versions in the same table (requiring VACUUM to clean up). MySQL InnoDB stores old versions in a separate undo log. Understanding these differences matters for performance tuning.
How It Works in Practice
PostgreSQL MVCC Example
Version Visibility Rules (PostgreSQL)
Each tuple has two hidden columns:
- xmin: The transaction ID that created this version
- xmax: The transaction ID that deleted/replaced this version (infinity if current)
A tuple is visible to transaction T if:
- xmin is committed AND xmin was committed before T's snapshot
- xmax is not set, OR xmax is not committed, OR xmax committed after T's snapshot
Snapshot Isolation
When a transaction begins (or when each statement executes, depending on isolation level), the database takes a snapshot — a record of which transactions are committed at that moment. The transaction only sees data from committed transactions in its snapshot.
Implementation
Understanding MVCC in PostgreSQL queries:
VACUUM — cleaning up old versions:
MySQL InnoDB vs PostgreSQL MVCC
| Aspect | PostgreSQL | MySQL InnoDB |
|---|---|---|
| Old versions stored | In-place (heap tuple) | Undo log (separate) |
| Cleanup mechanism | VACUUM (external) | Purge thread (automatic) |
| Bloat risk | Yes (if VACUUM falls behind) | Lower (undo log is separate) |
| Read performance | May read dead tuples | Follows undo chain |
| Index handling | HOT updates avoid index bloat | Index points to latest version |
Trade-offs
Benefits:
- Readers never block writers (high concurrency)
- Writers never block readers (consistent reads)
- Snapshot isolation prevents most concurrency anomalies
- No read locks needed (lower overhead than lock-based systems)
Costs:
- Storage overhead from multiple versions (table bloat in PostgreSQL)
- VACUUM overhead in PostgreSQL (CPU, I/O, potential table locks)
- Long-running transactions prevent old versions from being cleaned up
- Write conflicts still require resolution (serialization failures)
Critical operational concern — VACUUM: In PostgreSQL, the single most common production issue with MVCC is VACUUM falling behind. If a long-running transaction holds a snapshot, VACUUM cannot clean up any versions created after that snapshot. The table grows unbounded (bloat), queries slow down, and eventually disk fills up.
Monitor these metrics:
- Dead tuple ratio per table
- Oldest running transaction age
- Autovacuum run frequency and duration
Common Misconceptions
- "MVCC eliminates all concurrency problems" — MVCC prevents dirty reads and non-repeatable reads, but write-write conflicts (two transactions updating the same row) still require conflict resolution. Under Serializable isolation, one transaction aborts.
- "VACUUM is optional" — In PostgreSQL, VACUUM is essential. Without it, table bloat grows unbounded, query performance degrades, and you risk transaction ID wraparound (a catastrophic failure mode).
- "Read-only queries have zero overhead" — Read-only queries in MVCC must check visibility rules for every tuple. In heavily updated tables with many dead tuples, this overhead is significant.
- "MVCC means no locking at all" — MVCC eliminates read locks but writes still acquire row-level locks to prevent concurrent modifications to the same row.
- "All MVCC implementations are the same" — PostgreSQL and MySQL InnoDB implement MVCC very differently. PostgreSQL's in-place versioning requires VACUUM. InnoDB's undo-log approach avoids table bloat.
How This Appears in Interviews
- "How does PostgreSQL handle concurrent reads and writes?" — MVCC with tuple versioning, xmin/xmax, snapshot isolation, and VACUUM.
- "Why is your PostgreSQL table growing even though row count is stable?" — Table bloat from dead tuples. VACUUM is not keeping up, possibly due to long-running transactions.
- "Explain the difference between Read Committed and Repeatable Read" — Read Committed takes a new snapshot per statement. Repeatable Read takes one snapshot at transaction start.
- "Compare optimistic and pessimistic concurrency control" — MVCC is an optimistic approach (proceed without locks, detect conflicts at commit). Compare with pessimistic locking.
Related Concepts
- ACID Properties — MVCC implements the Isolation guarantee
- Database Transactions — transactions define the boundaries of MVCC snapshots
- Optimistic vs Pessimistic Locking — alternative concurrency strategies
- Write-Ahead Logging — WAL records version changes for crash recovery
- Database Indexing — indexes interact with MVCC (HOT updates, index bloat)
- System Design Interview Guide
- Algoroq Pricing — access all concept deep-dives
GO DEEPER
Learn from senior engineers in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.