SYSTEM_DESIGN
System Design: Collaborative Document Editor (Google Docs-scale)
Design a Google Docs-scale collaborative document editor supporting real-time multi-user editing with conflict resolution. Deep dive into Operational Transformation (OT) and CRDT approaches, presence awareness, and persistent change history.
Requirements
Functional Requirements:
- Multiple users edit the same document simultaneously with real-time conflict-free merging
- Rich text editing: bold, italic, headers, lists, tables, inline images
- Presence: see other users' cursors and selections in real time
- Complete revision history: view any past version; restore to any checkpoint
- Comments and threaded replies anchored to document ranges
- Offline editing: changes made offline are merged when connectivity is restored
Non-Functional Requirements:
- Sub-100ms latency for local keystrokes to appear in the editor (local-first)
- Convergence: all collaborators reach the same document state within 2 seconds
- Support documents up to 10MB of content with 50 simultaneous editors
- Revision history retained for 30 days on free tier, indefinitely on paid
- 99.9% uptime; document unavailability causes work loss
Scale Estimation
At Google Docs scale: 1B documents, 50M DAU, average 5 active editors/document during collaboration. Operations rate: a fast typist generates ~6 operations/second (characters typed + cursor moves). 50M users × 1% actively typing at peak = 500k users typing = 3M operations/second. Each operation is ~100-200 bytes. Total: ~600MB/second of operation data. Revision storage: 3M ops/second × 200 bytes × 86,400 seconds = ~50TB of raw operations/day (but heavily compacted to snapshots).
High-Level Architecture
The system is built around two complementary concerns: a Collaboration Engine for real-time multi-user editing and a Document Storage Service for persistence and history. The Collaboration Engine runs on a Document Server that maintains in-memory document state for all actively edited documents. Clients connect via WebSocket to their document's server and send operations. The server applies operations using a concurrency control algorithm and broadcasts them to all collaborators.
Concurrency Control — OT vs. CRDT: Two dominant approaches exist:
Operational Transformation (OT) — the approach used by Google Docs. Each operation is transformed against concurrent operations before application to ensure convergence. For text: Insert(pos, char) and Delete(pos) operations are transformed pair-wise when concurrent. The server is a central authority: it assigns a global operation order (revision number) and broadcasts transformed operations to clients. Clients apply received operations using OT's transformation function. OT is well-understood for text but complex to implement correctly for rich text (embedded objects, tables).
CRDTs (Conflict-free Replicated Data Types) — used by Figma, Notion. Operations are designed to be commutative and associative, so any order of application yields the same result. For text, a common CRDT is a sequence CRDT (Logoot, LSEQ, or Yjs's Y.Text) where each character has a globally unique position identifier, making insert and delete operations naturally conflict-free. CRDTs enable true peer-to-peer collaboration without a central authority and support offline editing natively. The trade-off is larger operation sizes (position IDs) and potential document growth (deleted characters as tombstones).
Google Docs uses OT with a central server; Figma uses CRDTs. For this design we implement a server-authoritative OT model with CRDT-inspired offline support.
Core Components
Document Collaboration Server
A stateful service maintaining in-memory representations of actively edited documents. Each document has an OT state: the current document snapshot at the server's committed revision, and a buffer of unacknowledged operations from clients. When an operation arrives from client A: (1) transform the operation against all server operations since the client's last acknowledged revision (OT transformation); (2) apply the transformed operation to the server document state; (3) assign it a revision number; (4) broadcast the transformed operation to all other clients; (5) persist the operation to the operation log. Multiple clients editing simultaneously are handled by the transformation step ensuring convergence.
Operation Log & Persistence Service
All operations are durably logged to Kafka (for immediate replication) and PostgreSQL (for queryable history). Operations are stored as: {doc_id, revision_number, client_id, operation_type, operation_data JSONB, timestamp}. Periodic snapshots (every 100 operations) capture the full document state, enabling efficient history reconstruction without replaying thousands of operations. Snapshots are stored in S3; the snapshot + operations since the snapshot are loaded when a document server opens a document.
Presence Service
Tracks cursor positions and selections for all active editors. Unlike document operations, presence data is ephemeral and eventually consistent — a cursor position doesn't need strong consistency. Clients send cursor update events to the Presence Service (via WebSocket), which stores the latest position per user in Redis with a short TTL (5 seconds — refreshed by heartbeat). Presence data is broadcast to all editors via a separate low-priority WebSocket channel, decoupled from the operation channel so a presence update never delays an operation delivery.
Database Design
Documents: documents (doc_id UUID, owner_id, title, created_at, last_modified_at, current_revision INT, snapshot_revision INT, snapshot_s3_key). Operations: doc_operations (doc_id, revision_number, client_id, user_id, operation JSONB, timestamp) — composite PK (doc_id, revision_number) ensures total order per document. Snapshots: doc_snapshots (doc_id, revision_number, s3_key, snapshot_at). Comments: comments (comment_id, doc_id, user_id, anchor_start, anchor_end, content, resolved), comment_replies (reply_id, comment_id, user_id, content).
Document access control: doc_permissions (doc_id, principal_id, principal_type ENUM(user, group, public), role ENUM(viewer, commenter, editor, owner)). The Document Server checks permissions on WebSocket connection upgrade — unauthorized users are disconnected before seeing any document content.
API Design
WebSocket /ws/v1/documents/{docId}/collab — establishes collaboration session; client receives current snapshot + revision; sends {op_type, op_data, client_revision} frames; receives {op_type, op_data, server_revision, user_id} frames from other collaborators.
GET /api/v1/documents/{docId}/history?from_revision=&to_revision= — returns operation log for history view.
POST /api/v1/documents/{docId}/restore?revision={n} — restores document to a past revision (creates a new operation that replaces the current state).
POST /api/v1/documents/{docId}/comments — creates a comment anchored to a document range.
Scaling & Bottlenecks
The document server is stateful — a document must be loaded on exactly one server for OT to work correctly (a central authority is required for operation ordering). This is the fundamental scalability constraint: a single document cannot be served by multiple servers simultaneously without complex distributed consensus. Google's solution: documents are sharded across servers by doc_id; a routing layer directs all WebSocket connections for a document to the same server. When a server becomes overloaded, documents are migrated (with a brief handoff pause) to other servers.
Operation log storage grows indefinitely for long-lived documents. Compaction strategies: (1) periodic snapshots reduce the number of operations that need to be replayed on cold start; (2) operations older than 30 days (free tier) are deleted and only snapshots are retained; (3) semantic compression merges consecutive character insertions from the same user into a single operation, reducing log size by 10-50x.
Key Trade-offs
- OT vs. CRDT: OT requires a central server for operation ordering, making it simpler to implement correctly for complex document types (rich text, tables) but incompatible with true peer-to-peer or offline-first collaboration; CRDTs are naturally offline-capable and decentralized but have larger operation overhead and tombstone accumulation over time.
- Server-authoritative vs. peer-to-peer: Server authority simplifies conflict resolution and provides a single source of truth for history and permissions, but creates a scaling bottleneck (one active server per document) and a single point of failure; P2P scales trivially but requires full CRDT implementation and complicates permission enforcement.
- Fine-grained vs. coarse-grained operations: Character-level OT operations (Insert(5, 'a')) enable precise conflict resolution but generate 3M operations/second at scale; word-level or paragraph-level operations reduce volume but produce coarser conflict resolution that may overwrite user changes.
- Snapshot frequency: Frequent snapshots (every 10 operations) make cold start fast but increase S3 storage and write costs; infrequent snapshots (every 1,000 operations) minimize storage but cause slow cold starts for actively edited documents with long histories.
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.