SYSTEM_DESIGN

System Design: CQRS Architecture

Design a CQRS (Command Query Responsibility Segregation) system that separates read and write models for optimal scaling, using event-driven synchronization between the command side and query-optimized projections.

16 min readUpdated Jan 15, 2025
system-designcqrsevent-drivenread-modelinfrastructure

Requirements

Functional Requirements:

  • Separate the write model (commands that mutate state) from the read model (queries that return data)
  • Maintain query-optimized read models (projections) synchronized with the write model via events
  • Commands enforce business invariants (validation, authorization) before writing
  • Queries are served from pre-materialized views without touching the write store
  • Support multiple independent read models for different query patterns (list view, detail view, search index)
  • Handle projection rebuild: replay event history to reconstruct a read model from scratch

Non-Functional Requirements:

  • Write-side latency under 20ms p99 (command execution + event publish)
  • Read-side latency under 5ms p99 (projection lookup from optimized store)
  • Read-write ratio 100:1 — the system must scale reads independently from writes
  • Eventual consistency lag between write and read sides under 500ms for 99th percentile

Scale Estimation

An e-commerce system: 10,000 writes/sec (orders, inventory updates), 1,000,000 reads/sec (product queries, order status checks). The read:write ratio of 100:1 drives the CQRS decision — a single normalized database cannot serve this load pattern without sacrificing either write performance (indexing overhead) or read performance (lack of denormalization). Write store: ~1 TB (normalized relational data + event log). Read stores: multiple projections, each 5-20 TB (denormalized, pre-joined data optimized for specific query patterns).

High-Level Architecture

The command side handles all state mutations. Commands arrive at command handlers (domain services), which validate business rules, load the aggregate from the write store, apply the domain logic, persist the updated state, and publish domain events. The write store is a normalized relational database (PostgreSQL) optimized for ACID transactions and business rule enforcement — not for serving complex queries.

Domain events (published after successful commands) are consumed by projection builders that maintain the read side. Each projection builder subscribes to relevant events and updates a query-optimized store. Different query patterns get different stores: a product catalog projection in Elasticsearch (full-text search), an order summary projection in Redis (fast key-value lookup), a reporting projection in ClickHouse (analytical queries). Projections are rebuilt entirely when a bug is fixed or a new query pattern emerges — the event log provides the historical data needed.

The API layer routes requests: commands (POST, PUT, DELETE) go to command handlers; queries (GET) go directly to read model stores, bypassing the write store entirely. This routing is enforced at the application layer or API gateway level.

Core Components

Command Handler

Command handlers are the gatekeepers of the write side. A command like PlaceOrder(user_id, items, payment) is processed as: (1) validate inputs; (2) check authorization; (3) load the Order aggregate from the write store; (4) call aggregate.placeOrder(items, payment) which enforces business invariants (inventory available, payment method valid); (5) if successful, persist the new state to PostgreSQL; (6) publish OrderPlaced event to Kafka. Steps 5 and 6 must be atomic — use the outbox pattern: write both the state update and the event to PostgreSQL in the same transaction; a separate outbox relay process reads unpublished events and writes to Kafka. This avoids the dual-write problem (state change without event, or event without state change).

Projection Builder

Projection builders are event consumers that maintain read models. Each builder processes events in order (Kafka partition order) and applies idempotent updates to the projection store. A product inventory projection handles InventoryAdjusted events: update the projected inventory count in Redis for the affected SKU. The projection tracks its Kafka consumer offset — on restart, it resumes from the last committed offset, replaying any unprocessed events. Projection updates must be idempotent (same event processed twice produces the same result) because Kafka delivery is at-least-once. Idempotency is achieved via upsert operations with the event's offset as the idempotency key.

Outbox Pattern

The outbox pattern solves the dual-write problem. Instead of writing to both PostgreSQL and Kafka in sequence (which risks partial failure), the command handler writes only to PostgreSQL: the state change AND an outbox_events table entry in the same database transaction. A separate process (the outbox relay) polls the outbox_events table, publishes to Kafka, and marks entries as published. This ensures events are published if and only if the state change was committed. Transactional outbox with change data capture (CDC — e.g., Debezium reading PostgreSQL WAL) is more efficient than polling: Debezium streams WAL changes directly to Kafka in near real time without polling overhead.

Database Design

Write store (PostgreSQL): normalized tables representing domain aggregates. orders (id, user_id, status, total, created_at), order_items (order_id, product_id, quantity, price), outbox_events (id, aggregate_id, event_type, payload, published, created_at). The outbox table is the event bus bridge. Write store is optimized for ACID transactions, not for complex queries.

Read stores (varied by query pattern): Redis for order status lookup (order:{id} → {status, summary}); Elasticsearch for product search (full-text + faceted filtering); PostgreSQL read replica for reports that need SQL but don't need real-time freshness; ClickHouse for analytical dashboards. Each projection store is sized for its workload independently — Elasticsearch scales for search throughput, Redis for point-lookup latency.

API Design

Scaling & Bottlenecks

The write side bottleneck is the outbox relay latency — the time from command commit to event publication to Kafka. CDC-based relay (Debezium) achieves <100ms lag by streaming WAL changes. Polling-based relay adds latency proportional to the polling interval (typically 500ms-1s). The projection sync lag (from Kafka publish to read model update) is typically 100-500ms — acceptable for most use cases, but read-your-own-writes scenarios require special handling (read from write store immediately after a command, then switch to read model for subsequent reads).

Projection rebuild is a scaling challenge for large event histories. A full rebuild from 100 billion events takes days with a single builder. Parallel rebuild: partition the event log by time range and run multiple builders in parallel, each covering a time window; merge the results at the end. This reduces rebuild time from O(n) to O(n/parallelism).

Key Trade-offs

  • Eventual vs. strong consistency: CQRS inherently introduces eventual consistency between write and read sides — systems requiring always-consistent reads should consider whether CQRS is appropriate
  • Complexity vs. scalability: CQRS adds significant architectural complexity (two models, event pipeline, projection builders) that is only justified when read and write patterns diverge significantly
  • Projection granularity: One projection per query pattern maximizes read performance but multiplies the number of projections to maintain; fewer, more general projections simplify operations but may not be optimally indexed for all queries
  • Synchronous vs. asynchronous commands: Synchronous command handling (wait for persistence before returning) provides immediate confirmation; async (acknowledge before persistence, process in background) improves throughput but requires clients to poll for results

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.