SYSTEM_DESIGN
System Design: CQRS Architecture
Design a CQRS (Command Query Responsibility Segregation) system that separates read and write models for optimal scaling, using event-driven synchronization between the command side and query-optimized projections.
Requirements
Functional Requirements:
- Separate the write model (commands that mutate state) from the read model (queries that return data)
- Maintain query-optimized read models (projections) synchronized with the write model via events
- Commands enforce business invariants (validation, authorization) before writing
- Queries are served from pre-materialized views without touching the write store
- Support multiple independent read models for different query patterns (list view, detail view, search index)
- Handle projection rebuild: replay event history to reconstruct a read model from scratch
Non-Functional Requirements:
- Write-side latency under 20ms p99 (command execution + event publish)
- Read-side latency under 5ms p99 (projection lookup from optimized store)
- Read-write ratio 100:1 — the system must scale reads independently from writes
- Eventual consistency lag between write and read sides under 500ms for 99th percentile
Scale Estimation
An e-commerce system: 10,000 writes/sec (orders, inventory updates), 1,000,000 reads/sec (product queries, order status checks). The read:write ratio of 100:1 drives the CQRS decision — a single normalized database cannot serve this load pattern without sacrificing either write performance (indexing overhead) or read performance (lack of denormalization). Write store: ~1 TB (normalized relational data + event log). Read stores: multiple projections, each 5-20 TB (denormalized, pre-joined data optimized for specific query patterns).
High-Level Architecture
The command side handles all state mutations. Commands arrive at command handlers (domain services), which validate business rules, load the aggregate from the write store, apply the domain logic, persist the updated state, and publish domain events. The write store is a normalized relational database (PostgreSQL) optimized for ACID transactions and business rule enforcement — not for serving complex queries.
Domain events (published after successful commands) are consumed by projection builders that maintain the read side. Each projection builder subscribes to relevant events and updates a query-optimized store. Different query patterns get different stores: a product catalog projection in Elasticsearch (full-text search), an order summary projection in Redis (fast key-value lookup), a reporting projection in ClickHouse (analytical queries). Projections are rebuilt entirely when a bug is fixed or a new query pattern emerges — the event log provides the historical data needed.
The API layer routes requests: commands (POST, PUT, DELETE) go to command handlers; queries (GET) go directly to read model stores, bypassing the write store entirely. This routing is enforced at the application layer or API gateway level.
Core Components
Command Handler
Command handlers are the gatekeepers of the write side. A command like PlaceOrder(user_id, items, payment) is processed as: (1) validate inputs; (2) check authorization; (3) load the Order aggregate from the write store; (4) call aggregate.placeOrder(items, payment) which enforces business invariants (inventory available, payment method valid); (5) if successful, persist the new state to PostgreSQL; (6) publish OrderPlaced event to Kafka. Steps 5 and 6 must be atomic — use the outbox pattern: write both the state update and the event to PostgreSQL in the same transaction; a separate outbox relay process reads unpublished events and writes to Kafka. This avoids the dual-write problem (state change without event, or event without state change).
Projection Builder
Projection builders are event consumers that maintain read models. Each builder processes events in order (Kafka partition order) and applies idempotent updates to the projection store. A product inventory projection handles InventoryAdjusted events: update the projected inventory count in Redis for the affected SKU. The projection tracks its Kafka consumer offset — on restart, it resumes from the last committed offset, replaying any unprocessed events. Projection updates must be idempotent (same event processed twice produces the same result) because Kafka delivery is at-least-once. Idempotency is achieved via upsert operations with the event's offset as the idempotency key.
Outbox Pattern
The outbox pattern solves the dual-write problem. Instead of writing to both PostgreSQL and Kafka in sequence (which risks partial failure), the command handler writes only to PostgreSQL: the state change AND an outbox_events table entry in the same database transaction. A separate process (the outbox relay) polls the outbox_events table, publishes to Kafka, and marks entries as published. This ensures events are published if and only if the state change was committed. Transactional outbox with change data capture (CDC — e.g., Debezium reading PostgreSQL WAL) is more efficient than polling: Debezium streams WAL changes directly to Kafka in near real time without polling overhead.
Database Design
Write store (PostgreSQL): normalized tables representing domain aggregates. orders (id, user_id, status, total, created_at), order_items (order_id, product_id, quantity, price), outbox_events (id, aggregate_id, event_type, payload, published, created_at). The outbox table is the event bus bridge. Write store is optimized for ACID transactions, not for complex queries.
Read stores (varied by query pattern): Redis for order status lookup (order:{id} → {status, summary}); Elasticsearch for product search (full-text + faceted filtering); PostgreSQL read replica for reports that need SQL but don't need real-time freshness; ClickHouse for analytical dashboards. Each projection store is sized for its workload independently — Elasticsearch scales for search throughput, Redis for point-lookup latency.
API Design
Scaling & Bottlenecks
The write side bottleneck is the outbox relay latency — the time from command commit to event publication to Kafka. CDC-based relay (Debezium) achieves <100ms lag by streaming WAL changes. Polling-based relay adds latency proportional to the polling interval (typically 500ms-1s). The projection sync lag (from Kafka publish to read model update) is typically 100-500ms — acceptable for most use cases, but read-your-own-writes scenarios require special handling (read from write store immediately after a command, then switch to read model for subsequent reads).
Projection rebuild is a scaling challenge for large event histories. A full rebuild from 100 billion events takes days with a single builder. Parallel rebuild: partition the event log by time range and run multiple builders in parallel, each covering a time window; merge the results at the end. This reduces rebuild time from O(n) to O(n/parallelism).
Key Trade-offs
- Eventual vs. strong consistency: CQRS inherently introduces eventual consistency between write and read sides — systems requiring always-consistent reads should consider whether CQRS is appropriate
- Complexity vs. scalability: CQRS adds significant architectural complexity (two models, event pipeline, projection builders) that is only justified when read and write patterns diverge significantly
- Projection granularity: One projection per query pattern maximizes read performance but multiplies the number of projections to maintain; fewer, more general projections simplify operations but may not be optimally indexed for all queries
- Synchronous vs. asynchronous commands: Synchronous command handling (wait for persistence before returning) provides immediate confirmation; async (acknowledge before persistence, process in background) improves throughput but requires clients to poll for results
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.