You’re designing a distributed system where users edit state concurrently—while offline, across regions, with retries, reordering, and partitions. You want availability and low latency, but you also want replicas to converge without a central lock or a consensus round-trip for every update.
You run a chain of three coffee shops: North, Central, and South. Each shop has its own POS system and local database because you want the cashier to keep working even if the internet dies.
Customers can:
The shops sync data whenever the network allows.
Now the internet becomes unreliable:
Yet you still want the system to end up with the same final order state everywhere.
If North and South both modify the same order while partitioned, how do you merge them later?
Hold your answer.
Think of the order like a shared whiteboard in a busy cafe. If multiple baristas write on it at the same time, you don’t want to call a manager to approve every marker stroke. You want rules like:
CRDTs are those rules: data types designed so that replicas can diverge temporarily but converge automatically when they exchange updates.
CRDTs let you choose availability and local writes under partition, while still guaranteeing convergence, without requiring coordination for every update.
Which of the three options (1/2/3) best matches the CRDT approach? (Don’t answer yet; we’ll come back.)
Two cashiers update the same order total while offline:
When they reconnect, you need a single final state.
When distributed systems folks say “conflict-free,” do they mean:
A) There are no conflicts in the business sense B) There are no conflicts at the replication protocol level because merges are well-defined C) The network never partitions
Pause and think.
“Conflict-free” means the system can merge concurrent updates deterministically.
But business semantics can still be tricky:
In a restaurant, two servers might both “update the table status” at the same time. The restaurant can adopt a rule:
That rule is a policy. CRDTs are mechanisms to apply such policies safely under concurrency.
CRDTs eliminate replication-level conflicts by defining merges that are associative, commutative, and idempotent.
Which answer (A/B/C) is correct? Keep it in mind; we’ll reveal soon.
Replicas exchange state or operations in arbitrary order:
If two replicas merge updates in different orders, what property must the merge function have so they still converge?
Pick which ones are necessary.
Mental model: “Merge is like pouring coffee”
Imagine each replica has a cup of coffee representing its local state.
For the final taste to be the same everywhere:
These three properties are the “replication doesn’t care about network weirdness” toolkit.
Most state-based CRDTs use a join-semilattice:
Convergence comes from:
CRDT convergence is a math guarantee: if all replicas eventually receive each other’s updates, they converge regardless of message order/duplication.
Which of the three properties (1/2/3) are required? (We’ll reveal shortly.)
You’re building a “flash sale” inventory service. Stock must never go negative.
CRDTs give you convergence. They do not automatically give you application invariants.
Some invariants are compatible with CRDTs (e.g., monotonic ones like “likes only increase”).
Others require coordination or advanced techniques:
A restaurant can let each branch accept reservations locally (high availability), but ensuring you never overbook across branches might require coordination unless you partition capacity (escrow).
CRDTs solve the “merge” problem; they don’t magically solve all correctness constraints.
Which statement (1/2/3) is correct?
North and South are syncing. You need to decide what to send:
If network messages can be duplicated and reordered, which approach is easier to make correct?
A) State-based (CvRDT): send state, merge via JOIN B) Operation-based (CmRDT): send operations, apply in any order
State-based: “Send the whole inventory list” Like a cafe manager sending the entire current menu board photo to another branch. The receiver compares and takes the union.
Pros:
Cons:
Operation-based: “Send the changes you made” Like texting: “Added oat milk” or “Removed espresso shot.”
Pros:
Cons:
| Aspect | State-based (CvRDT) | Operation-based (CmRDT) |
|---|---|---|
| What you send | Full/partial state | Operations (deltas) |
| Merge/apply | JOIN/merge function | Apply operations |
| Network assumptions | Very weak (dup/reorder OK) | Often needs causal delivery or op IDs |
| Typical cost | More bandwidth, simpler correctness | Less bandwidth, more protocol complexity |
| Example | G-Counter with map of replica counts | RGA/WOOT-like text CRDTs |
[IMAGE: side-by-side diagram showing two replicas; left uses periodic state exchange with merge, right uses streaming ops with causal metadata]
State-based CRDTs push complexity into the data structure; op-based CRDTs push complexity into delivery guarantees and metadata.
Which approach would you choose for a mobile offline-first notes app? Why?
Your product team asks for:
You want to pick CRDTs that match each.
Match each requirement to a CRDT type:
Options: A) G-Counter B) PN-Counter C) OR-Set (Observed-Remove Set) D) LWW-Register
Pause and think. We’ll reveal answers after explaining.
Each cafe branch tracks “number of coffees sold today.” You never decrement.
If each replica keeps a local integer and you sum them, what happens when messages duplicate?
Mental model: “Each branch has its own tally sheet”
Instead of a single shared counter, each branch keeps its own tally.
Total = sum of all cells.
State is a map: replica_id -> count. Merge is element-wise max.
Why max?
[CODE: Python, implement a state-based G-Counter with merge and increment, demonstrate convergence under reordering/duplication]
G-Counter turns a contentious shared integer into independent per-replica monotonic components.
What breaks if you allow decrements in a G-Counter without changing the design?
You track “active orders”:
How can you support decrements while still keeping monotonic state per replica?
PN-Counter = two G-Counters:
Value = sum(P) - sum(N)
Merge each component with element-wise max.
[CODE: Go, implement PN-Counter and show merges]
Same as G-Counter.
You can represent non-monotonic values as the difference of two monotonic ones.
Can PN-Counter guarantee value >= 0 without coordination? Why or why not?
A team says: “We’ll just do last-write-wins with timestamps for everything. That’s a CRDT, right?”
Is LWW a CRDT? If yes, when is it a bad idea?
LWW-Register can be modeled as a CRDT if:
But LWW is often a data loss policy:
LWW is acceptable for some fields:
It’s dangerous for accumulative semantics:
LWW is a valid convergence strategy, but it encodes “discard one side” as your business rule.
Name one domain field where LWW is correct, and one where it’s dangerous.
Customers can add/remove items from a shopping cart while offline.
What should the merged cart contain?
Options:
Pause and think.
Think of a cart as a clipboard that gets copied to each branch. If one branch removes “bagel” without ever seeing that another branch added it, should that removal delete the unseen future addition?
Many systems choose: Observed-Remove semantics.
With sets, “conflict resolution” is really “what does remove mean under concurrency?”
In your product, would users expect add-wins or observed-remove behavior for carts? Why?
You want cart semantics that match user intent:
How can a set remember which add a remove is targeting?
OR-Set assigns a unique tag to each add:
Merge = union of add-tag pairs minus removed tags.
[IMAGE: timeline diagram showing concurrent add and remove of same element with tags; remove only deletes observed tags]
[CODE: TypeScript, implement a simple OR-Set with add tags and remove tombstones; show convergence]
OR-Set encodes causality at the element level by tracking unique add instances.
Why can’t a plain “add-wins set” without tags represent the difference between two concurrent adds of the same element?
Your OR-Set stores removed tags forever. Over months, metadata grows.
Can you delete tombstones safely?
To remove tombstones, you need to know that all replicas have seen the remove (or at least won’t reintroduce the removed tag).
This is the concept of causal stability:
This often requires extra machinery:
A restaurant can throw away old paper receipts only after accounting confirms every branch has reported them.
Many CRDTs trade coordination-free updates for metadata that must be managed with careful compaction.
What could happen if you garbage-collect tombstones too early?
Two devices modify a cart concurrently:
“Correct” depends on intent:
CRDT design is about encoding these semantics explicitly.
CRDTs don’t eliminate product decisions; they force you to make them explicit and consistent under concurrency.
Pick a domain (cart, ACL, feature flags). Which semantics would you choose and why?
You’re building a collaborative note editor.
Two users concurrently insert characters at the same position.
Why can’t we just use an OR-Set of characters?
Text has order. Sets don’t.
Text CRDTs (e.g., RGA, Logoot, LSEQ) assign identifiers to positions so that:
But there are trade-offs:
[IMAGE: depiction of a sequence CRDT where characters have position IDs between neighbors; concurrent inserts create IDs that still sort]
[CODE: Pseudocode, show how a sequence CRDT assigns position IDs and merges concurrent inserts]
Sequence CRDTs replace “index positions” with stable identifiers so concurrent edits can be merged deterministically.
What happens if your position identifiers grow without bound? What engineering techniques mitigate this?
You choose an op-based CRDT for efficiency. Your network can reorder messages.
If operations commute, do you still need causal delivery?
Many op-based CRDTs require either:
Some designs embed enough metadata to relax delivery guarantees.
Mental model: “Kitchen tickets” If the kitchen receives “cancel burger” before it receives “make burger,” it might ignore the cancel and still make the burger later.
Causal delivery ensures dependent instructions arrive in the right relationship.
Op-based CRDTs often require causal ordering or additional metadata to be safe under arbitrary reordering.
Would you rather pay for causal delivery in the network layer or in the CRDT metadata? Why?
Your replicas experience:
Which of these failures do CRDTs handle naturally, and which require additional system design?
Pause and think before reading.
If a branch loses its tally sheet in a fire, it can’t reconstruct how many coffees it sold unless it had backups or receipts.
CRDTs are not a persistence layer. Durability and identity management still matter.
What’s the worst outcome if a replica loses its local CRDT state and rejoins with zeros?
You have 50 replicas. You can’t broadcast full state constantly.
How do systems ensure all replicas eventually receive all updates?
Common approaches:
[IMAGE: gossip anti-entropy diagram showing random peer sync rounds converging]
Instead of sending full state, send only the “delta” since last sync.
[CODE: Rust, sketch delta-state G-Counter where increment returns a delta and merge applies it]
Anti-entropy is the engine; CRDTs are the steering wheel. You need both for convergence in real deployments.
What happens if anti-entropy never completes (e.g., a replica is permanently offline)? Does CRDT convergence still hold?
An engineer says: “Eventual consistency is basically random; you can’t reason about it.”
Is that true for CRDT-based eventual consistency?
CRDTs provide strong guarantees:
What’s not guaranteed:
CRDTs turn eventual consistency from “best effort” into a precisely defined convergence model.
Define SEC in your own words in one sentence.
Your team is choosing between:
What do you “spend” when you choose CRDTs?
Common costs:
Benefits:
| Dimension | CRDT replication | Strong consistency (Raft/Paxos) |
|---|---|---|
| Writes under partition | continue | typically blocked or redirected |
| Latency | local | cross-quorum round trips |
| Convergence | deterministic | immediate agreement |
| Invariants | limited without coordination | easier to enforce |
| Metadata | can grow | typically smaller |
| Use cases | offline-first, collaboration, counters, carts | money transfers, inventory, strict ordering |
CRDTs are a deliberate CAP trade: prioritize availability and convergence over immediate agreement and some invariants.
Name a feature in your system that must be strongly consistent, and one that could be CRDT-based.
You’re building a “team task board”:
For each field, decide the conflict policy:
A practical approach:
Mental model: “Different kitchen stations” Not every part of the order needs the same conflict policy:
CRDT architecture is usually hybrid: CRDT where it fits, coordination where it must.
Which fields in the task board can be modeled as sets/counters, and which need registers?
You need a replicated value where concurrent updates should be merged by taking the maximum numeric value.
Pause and think: is this a CRDT? If yes, what’s the merge?
Answer (reveal): Yes, it can be a state-based CRDT if state monotonically increases and merge = max.
You need a replicated bank balance with deposits and withdrawals, and balance must never go negative.
Pause and think.
Answer (reveal): Plain PN-Counter won’t enforce non-negativity. You need coordination or bounded counter/escrow.
You need a replicated set where remove must always win over add, even if add is concurrent.
Pause and think.
Answer (reveal): Use a remove-wins set variant (e.g., with tombstones dominating adds) or design with causal metadata.
“Is it a CRDT?” is really “Can you define a merge that is deterministic and matches your semantics under concurrency?”
Invent a merge rule for “status” with values {online, away, offline} that behaves well under concurrency.
You want confidence that CRDTs aren’t just academic.
Why are CRDTs attractive in multi-region active-active deployments?
They avoid leader election and synchronous quorum writes for many operations.
CRDTs are a practical tool for active-active and offline-first systems, when semantics allow.
Which of your system’s state is “user intent that can be merged,” versus “global invariant that must be serialized”? List two examples.
Clients can submit CRDT operations directly (offline-first). A malicious client can:
What threats are unique to CRDT-based replication?
CRDTs assume replicas are honest participants unless you add controls. Mitigations:
CRDT correctness is not the same as trust. Convergence doesn’t prevent malicious state changes.
If clients can generate unique tags for OR-Set adds, how do you prevent tag-space abuse?
You’re building “CafeNotes”: a mobile app for baristas to keep shared notes per shift. Requirements:
Pick CRDTs:
[IMAGE: architecture diagram: mobile clients with local CRDT state, sync service doing anti-entropy, conflict-free merges]
Real apps compose multiple CRDTs: maps of registers, sets of IDs, counters, and sequences.
Where would you still use strong consistency in this app (if anywhere)?
Correct choice: 3) Build data types whose merges are deterministic and converge automatically.
Correct answer: B) No conflicts at the replication protocol level because merges are well-defined.
Required: Associativity + Commutativity + Idempotence (for state-based merge under duplication/reordering).
Correct statement: 2) CRDTs guarantee convergence, but invariants may require coordination or special designs.
1 -> A (G-Counter) 2 -> B (PN-Counter) 3 -> C (OR-Set) 4 -> D (LWW-Register)
You’re designing a multi-region, offline-capable “incident response board”:
Invariant risk:
Mitigation:
The best CRDT designs align your domain invariants with monotonic progress so the math works with your business logic.
Use CRDTs when:
Avoid (or constrain) CRDTs when:
Pick one feature you’re building right now. Write down:
If you can’t write a merge policy you’d be happy to explain to a teammate, you’re not ready for CRDTs yet.