Paxos Consensus Protocol Explained: The Foundation of Distributed Agreement

Demystify the Paxos consensus protocol — proposers, acceptors, and learners, with practical examples from Google Chubby and real interview scenarios.

paxosconsensusdistributed-systemsfault-tolerancereplication

Paxos Consensus Protocol

Paxos is a family of consensus protocols that enable a group of unreliable nodes to agree on a single value, even when some nodes fail or messages are lost, as long as a majority of nodes are operational.

What It Really Means

Distributed systems need agreement. Which node is the leader? What is the committed order of transactions? What is the current cluster configuration? Paxos answers these questions with provable guarantees: once a value is chosen by a majority, no different value can ever be chosen, even if nodes crash and restart.

Leslie Lamport published the Paxos algorithm in 1998 (originally written in 1989) using the metaphor of a Greek parliament on the island of Paxos. The paper was famously difficult to understand — Lamport later published "Paxos Made Simple" in 2001, noting that the algorithm itself is simple but reasoning about distributed systems is inherently complex.

Paxos is the theoretical foundation behind Google's Chubby lock service (which underpins Bigtable, Megastore, and Spanner), Apache ZooKeeper's ZAB protocol (a Paxos variant), and many internal systems at large tech companies. While Raft has become more popular for new implementations due to its clarity, understanding Paxos remains essential for distributed systems interviews and for understanding the theory behind modern consensus.

How It Works in Practice

The Three Roles

  • Proposer: Proposes values and drives the protocol forward
  • Acceptor: Votes on proposals and stores accepted values (the "memory" of the protocol)
  • Learner: Learns the chosen value once consensus is reached

In practice, a single node often plays all three roles simultaneously.

Basic Paxos (Single-Value Consensus)

Phase 1: Prepare

  1. A proposer selects a unique proposal number n (higher than any it has seen)
  2. It sends a Prepare(n) message to a majority of acceptors
  3. Each acceptor responds with a Promise(n): it promises not to accept any proposal with a number less than n, and returns any value it has previously accepted

Phase 2: Accept 4. If the proposer receives promises from a majority, it sends Accept(n, v) where v is either the value from the highest-numbered previously accepted proposal (if any) or the proposer's own value 5. Each acceptor receives the Accept request and accepts it unless it has already promised a higher proposal number 6. Once a majority of acceptors have accepted, the value is chosen

Key insight: In Phase 2, if any acceptor already accepted a value, the proposer must use that value (not its own). This is what prevents conflicting values from being chosen — it forces convergence.

Multi-Paxos (Sequence of Values)

Basic Paxos agrees on a single value. Real systems need to agree on a sequence of values (a log). Multi-Paxos runs a separate Paxos instance for each log position but optimizes by electing a stable leader who skips Phase 1 for subsequent proposals. This is essentially what Raft formalized into a cleaner protocol.

Real-World: Google Chubby

Chubby is Google's distributed lock service, built on Multi-Paxos. It provides coarse-grained locks, leader election, and a small amount of reliable storage. Google's Bigtable uses Chubby to elect tablet server leaders. Spanner uses Paxos groups for replication within each shard.

Chubby runs with 5 replicas (tolerating 2 failures). A stable leader handles most operations without running full Paxos rounds, making it fast in the common case. The full protocol only runs during leader changes.

Implementation

python

Trade-offs

Advantages

  • Proven correct: Formally verified safety properties — no two different values can both be chosen
  • Fault tolerant: Tolerates up to (N-1)/2 node failures in an N-node cluster
  • Flexible: No fixed leader requirement; any node can propose (unlike Raft)
  • Foundation: Understanding Paxos helps you understand every other consensus protocol

Disadvantages

  • Complexity: Notoriously difficult to implement correctly — Google's Chubby team found that "the Paxos algorithm, when described in plain English, is very simple. Implementing it correctly is a different matter."
  • Liveness issues: Two proposers can livelock by repeatedly preempting each other with higher proposal numbers. Multi-Paxos solves this by electing a stable leader.
  • Performance: Basic Paxos requires two round trips per consensus decision. Multi-Paxos reduces this to one in the steady state.
  • Hard to debug: Distributed protocols have subtle edge cases around node recovery, message reordering, and simultaneous proposals.

Common Misconceptions

  • "Paxos is obsolete because of Raft" — Raft is a more understandable formulation, but Paxos variants (Flexible Paxos, EPaxos, Fast Paxos) offer capabilities Raft does not, such as leaderless consensus and configurable quorum sizes. Google, Microsoft, and Amazon still use Paxos variants internally.

  • "Paxos requires a leader" — Basic Paxos is leaderless; any proposer can initiate consensus. Multi-Paxos uses a leader for performance, but leadership is an optimization, not a requirement.

  • "Paxos solves Byzantine fault tolerance" — Standard Paxos assumes crash-stop failures (nodes stop responding). It does not handle Byzantine failures (nodes sending incorrect data). PBFT and other protocols address that.

  • "One round of Paxos agrees on a sequence of operations" — A single Paxos instance agrees on one value. You need Multi-Paxos (one instance per log slot) for a replicated log.

How This Appears in Interviews

Paxos appears more in distributed systems theory discussions than in design questions:

  • "Explain the difference between Paxos and Raft" — Raft enforces a strict leader and contiguous log; Paxos allows leaderless operation and log holes. Both provide the same safety guarantees.
  • "Why is Paxos hard to implement?" — Discuss state persistence, crash recovery, reconfiguration, log compaction, and the gap between the algorithm and a production system.
  • "Design a distributed lock service" — Describe Chubby's architecture: Paxos-replicated state machine, coarse-grained locks, leader leases for reads, and session semantics.
  • "What happens during a Paxos leader change?" — Walk through how a new proposer sends Prepare with a higher number, discovers previously accepted values, and continues the protocol. See interview questions.

Related Concepts

GO DEEPER

Learn from senior engineers in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.