Consistent hashing is one of the most elegant algorithms in distributed systems. It solves a fundamental problem: how do you distribute data across N servers in a way that minimizes disruption when servers are added or removed? You'll find it inside Redis Cluster, Amazon DynamoDB, Apache Cassandra, CDNs, and load balancers.
The simplest approach to distributing data across N servers is modulo hashing:
This works beautifully — until you need to add or remove a server.
In a cache layer, this means a ~75–90% cache miss rate every time you scale. Databases would need to physically migrate most of their data. This is catastrophic.
If you cache 100 million items and adding one server causes 75 million cache misses — all hitting your database simultaneously — what happens to your database?
Consistent hashing solves this by mapping both keys and servers onto a circular hash space (the "ring").
Step 1: Place servers on the ring Hash each server's identifier to get a position.
Step 2: Place keys on the ring
Step 3: Assign each key to the next server clockwise
Add Server D at position 190°:
Only keys between the previous server position and D's position need to move. On average: only 1/N keys move when adding a server.
Remove Server B at 135°:
Only keys that were on B need to move. All others are unaffected. Again: only 1/N keys move.
With only a few physical servers, hash positions may cluster unevenly:
Each physical server is represented by multiple positions on the ring:
Now even if physical servers hash to clustered positions, their virtual nodes spread evenly. Load balances across all servers automatically.
Benefits of vnodes:
Cassandra uses 256 vnodes per node by default.
DynamoDB uses consistent hashing to distribute data across storage nodes. Each partition has a hash range (a slice of the ring), and items are assigned to partitions based on their partition key.
Cassandra's ring architecture is built on consistent hashing with the MURMUR3 hash function. 256 virtual nodes per physical node ensure even data distribution.
Redis Cluster uses 16,384 hash slots arranged as a ring. Each node handles a contiguous range of slots:
CDNs use consistent hashing to route requests for the same URL to the same edge server — maximizing cache hit rates. When edge servers are added or removed, only a fraction of URLs need to be rerouted.
| Strategy | Keys moved on change | Distribution | Complexity |
|---|---|---|---|
| Modulo hashing | ~(N-1)/N — catastrophic | Even | Trivial |
| Consistent hashing | ~1/N — minimal | Good with vnodes | Low |
| Rendezvous hashing | ~1/N | Very even | O(N) per lookup |
| Jump consistent hashing | ~1/N | Very even | O(log N), no ring |
Rendezvous hashing: For each key, score every server with hash(key + server_id). Assign to highest-scoring server. Simple, no ring, great distribution — but O(N) per lookup as servers grow.
Jump consistent hashing: A compact Google algorithm. Computes the bucket in O(log N) with no data structure — but only works with sequentially numbered buckets.
| Mistake | Why it's wrong | Correct approach |
|---|---|---|
| Using modulo hashing for distributed systems | Adding one server rehashes 75%+ of keys | Use consistent hashing — only 1/N keys move |
| Too few virtual nodes | Uneven distribution across physical servers | Use 100-256 virtual nodes per server (Cassandra default: 256) |
| Not accounting for heterogeneous servers | A powerful server gets the same load as a weak one | Assign more virtual nodes to servers with more capacity |
| Ignoring replication in ring design | Data on a failed node is lost | Replicate to the next N nodes clockwise on the ring |