CAP Theorem Explained: Consistency, Availability, and Partition Tolerance

A clear, practical explanation of the CAP theorem — what it really means, how it applies to real distributed systems, common misconceptions, and how to discuss it in system design interviews.

cap-theoremdistributed-systemsconsistencyavailabilitydatabases

CAP Theorem

The CAP theorem, proposed by Eric Brewer in 2000 and formally proved by Seth Gilbert and Nancy Lynch in 2002, states that a distributed data store can only provide two of the following three guarantees simultaneously:

  • Consistency (C) — Every read receives the most recent write or an error
  • Availability (A) — Every request receives a non-error response, without the guarantee that it contains the most recent write
  • Partition Tolerance (P) — The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes

What It Really Means

The CAP theorem is often misunderstood as "pick any two." In reality, network partitions are inevitable in distributed systems — networks fail, switches go down, cables get cut. So partition tolerance is not optional. The real choice is:

When a network partition occurs, do you choose Consistency or Availability?

  • CP systems: During a partition, reject requests that cannot guarantee consistency. Return an error or timeout. The system is consistent but not available.
  • AP systems: During a partition, serve requests with potentially stale data. The system is available but not consistent.

When there is no partition (normal operation), you can have both consistency and availability. The CAP theorem only constrains behavior during a network partition.

How It Works in Practice

CP System Example: Apache ZooKeeper

ZooKeeper uses the ZAB consensus protocol. When a network partition separates the leader from a minority of followers, the minority cannot serve writes (no quorum). Clients connected to the minority partition see errors. The majority partition continues operating normally.

Behavior during partition: Nodes in the minority partition become unavailable. Consistency is maintained because all successful reads/writes go through the leader in the majority partition.

AP System Example: Apache Cassandra (default configuration)

Cassandra with consistency level ONE serves reads and writes from any available replica. During a network partition, both sides of the partition continue accepting reads and writes independently.

Behavior during partition: Both sides serve requests, but they may diverge. When the partition heals, Cassandra uses last-write-wins or custom conflict resolution to reconcile differences. This is eventual consistency.

CP System Example: MongoDB

MongoDB uses a single primary for writes with automatic failover. During a network partition, if the primary is isolated, the remaining nodes elect a new primary (if they have a majority). The old primary steps down and rejects writes.

Behavior during partition: Writes may be temporarily unavailable during primary election (typically 10-30 seconds). Reads can be served from secondaries if configured, but may be stale.

Real-World Examples

SystemCAP ChoiceRationale
Google SpannerCP (with caveats)Uses TrueTime API for globally consistent transactions. Sacrifices availability in extreme partition scenarios.
Amazon DynamoDBAP (default) / CP (optional)Default eventually consistent reads for high availability. Offers strongly consistent reads at higher latency.
Redis ClusterCPUses WAIT command for synchronous replication. During partition, minority nodes reject writes.
CockroachDBCPSerializable isolation with Raft consensus. Unavailable during partition if quorum is lost.
RiakAPDesigned for high availability. Uses vector clocks and CRDTs for conflict resolution.

Common Misconceptions

Misconception 1: "You always have to give up one of the three"

Not true. When the network is healthy (no partition), a system can be both consistent and available. CAP only forces a choice during partitions. Most of the time, your system operates in a non-partitioned state.

Misconception 2: "CAP means you can pick any two"

Partition tolerance is mandatory for distributed systems. You cannot build a distributed system that is both consistent and available while ignoring network partitions. The choice is between CP and AP during partition events.

Misconception 3: "CA systems exist in distributed computing"

A "CA" system would be one that ignores network partitions. This is only possible with a single-node database (which is not distributed). As soon as you have multiple nodes, you must handle partitions.

Misconception 4: "Consistency in CAP = ACID consistency"

CAP consistency means linearizability — every read sees the latest write. ACID consistency means database invariants (foreign keys, constraints) are maintained. They are different concepts with the same name.

Misconception 5: "AP systems have no consistency guarantees"

AP systems typically offer eventual consistency — all replicas will converge to the same state once the partition heals. Some AP systems provide even stronger guarantees like causal consistency or read-your-writes consistency.

Beyond CAP: The PACELC Framework

Daniel Abadi proposed PACELC as an extension of CAP:

  • PAC: During a Partition, choose Availability or Consistency (same as CAP)
  • ELC: Else (no partition), choose Latency or Consistency

This captures an important real-world trade-off: even without partitions, there is a tension between consistency and latency. Synchronous replication gives consistency but adds latency. Asynchronous replication gives low latency but risks stale reads.

Examples:

  • DynamoDB: PA/EL — Available during partitions, low latency otherwise (eventually consistent by default)
  • MongoDB: PC/EC — Consistent during partitions, consistent otherwise (higher latency due to single primary)
  • Cassandra: PA/EL — Available during partitions, low latency otherwise (tunable per query)

How to Discuss CAP in System Design Interviews

  1. Don't just recite the theorem — show you understand the practical implications
  2. State the partition scenario explicitly: "If a network partition separates replicas in US-East and EU-West..."
  3. Explain your choice: "For a banking system, we choose consistency because showing a stale balance could lead to overdrafts"
  4. Mention PACELC if the interviewer asks about latency vs consistency trade-offs
  5. Know real systems: Be ready to classify DynamoDB, Cassandra, MongoDB, ZooKeeper, and Redis in CAP terms

Summary

The CAP theorem tells us that during a network partition, distributed systems must choose between consistency and availability. In practice, most modern databases let you tune this choice per query or per table. Understanding CAP is essential for system design interviews — not as a rule to memorize, but as a framework for reasoning about trade-offs in distributed data storage.

GO DEEPER

Learn from senior engineers in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.