System Design: Order Management System

Requirements

Functional Requirements:

Accept orders from multiple channels: web, mobile, POS, marketplace integrations
Orchestrate the order lifecycle: placed → confirmed → processing → shipped → delivered
Split orders across multiple warehouses based on inventory location
Handle payment capture, partial fulfillment, and partial refunds
Support order modifications and cancellations before shipment
Generate invoices and integrate with shipping carriers for tracking

Non-Functional Requirements:

Process 10,000 orders/sec at peak (e.g., flash sales, holiday events)
Order placement to confirmation latency under 2 seconds
99.99% availability for order placement; 99.9% for order status queries
Exactly-once order processing — no duplicate charges or fulfillments
Audit trail for every state transition with immutable event log

Scale Estimation

5M orders/day average, peaking at 10K/sec during events. Each order has ~3 line items averaging 500 bytes metadata = 1.5KB per order. Daily storage: 5M × 1.5KB = 7.5GB. Annual: ~2.7TB. The event log (every state transition) generates 10 events per order lifecycle × 5M orders = 50M events/day at 200 bytes each = 10GB/day. Order status queries: 20M queries/day = 231 QPS (users checking 'where is my order').

High-Level Architecture

The OMS follows an event-driven architecture with a saga orchestrator at its core. The order placement flow: Cart Service → Order API → Order Service (validates cart, calculates totals) → Saga Orchestrator. The Saga Orchestrator drives a multi-step distributed transaction: (1) Inventory Reservation — reserve stock via the Inventory Service; (2) Payment Authorization — authorize payment via the Payment Service; (3) Fraud Check — score the order via the Fraud Service; (4) Order Confirmation — write the confirmed order to the Order Store. Each step is idempotent with compensating actions: if payment fails, inventory is released; if fraud check fails, payment authorization is voided.

The fulfillment path is triggered by an order.confirmed event on Kafka: the Fulfillment Router determines the optimal warehouse(s) based on item availability and proximity to the shipping address. For multi-warehouse orders, the router creates sub-orders (shipments) each assigned to a warehouse. Warehouse management systems (WMS) consume shipment events, pick/pack items, and emit shipment.shipped events with carrier tracking numbers. A Tracking Aggregator polls carrier APIs and updates the order status.

All state transitions are recorded as immutable events in an event store (Kafka + PostgreSQL), enabling full audit trails and event sourcing for order reconstruction.

Core Components

Saga Orchestrator

The Saga Orchestrator is implemented as a state machine persisted in PostgreSQL. Each order creates a saga record with fields: saga_id, order_id, current_step, status, step_results JSONB, created_at, updated_at. The orchestrator advances through steps sequentially, recording the result of each step. On failure, it executes compensating transactions in reverse order. Idempotency is enforced via idempotency keys on each step — retrying a step with the same key returns the cached result. The orchestrator uses a polling pattern with a 'stuck saga detector' that alerts on sagas stalled for >5 minutes.

Fulfillment Router

The Fulfillment Router determines optimal warehouse assignment for each order. The routing algorithm: (1) Find all warehouses with sufficient stock for each line item; (2) Score each warehouse by: distance to shipping address (40% weight), current utilization/backlog (30%), shipping cost (20%), and carrier availability (10%); (3) If no single warehouse has all items, split into sub-orders minimizing the number of shipments. The router uses a cached inventory snapshot (refreshed every 30 seconds from the Inventory Service) for fast decisions, with a validation call before committing the assignment.

Order Event Store

Every order state transition emits an event to Kafka topic order-events and is persisted to a PostgreSQL events table: event_id, order_id, event_type (order.placed, order.confirmed, shipment.shipped, etc.), payload JSONB, timestamp. This append-only event log enables: (1) Full audit trail for compliance; (2) Event sourcing — reconstructing current order state by replaying events; (3) Analytics — downstream consumers (ClickHouse) build order funnel metrics. Events are retained for 7 years for regulatory compliance.

Database Design

PostgreSQL schema: orders table (order_id UUID PK, customer_id, channel, status ENUM, total_amount, currency, shipping_address JSONB, billing_address JSONB, created_at, updated_at). order_items table (item_id, order_id FK, product_id, variant_id, quantity, unit_price, status). shipments table (shipment_id, order_id FK, warehouse_id, carrier, tracking_number, status, shipped_at, delivered_at). payments table (payment_id, order_id FK, amount, method, authorization_code, status, captured_at).

The orders table is partitioned by created_at (monthly partitions) to manage the growing dataset. Older partitions are moved to cold storage (S3 via pg_dump) after 2 years. Indexes: composite index on (customer_id, created_at DESC) for order history, index on status for operational queries (e.g., find all 'processing' orders).

API Design

POST /api/v1/orders — Place an order; body contains cart_id, shipping_address, payment_method; returns order_id and saga status
GET /api/v1/orders/{order_id} — Fetch order details with line items, shipments, and current status
POST /api/v1/orders/{order_id}/cancel — Request cancellation; succeeds only if order is in cancellable state (before shipment)
GET /api/v1/orders/{order_id}/events — Fetch full event history for an order (audit trail)

Scaling & Bottlenecks

The saga orchestrator is the throughput bottleneck since each order requires sequential step execution. Scaling strategies: (1) Horizontal scaling — saga workers are stateless and pull work from a Kafka partition; order_id-based partitioning ensures a single order's saga runs on one worker; (2) Step parallelization where possible — fraud check and payment authorization can run concurrently after inventory reservation; (3) Fast-path optimization — repeat customers with verified payment methods skip fraud check, reducing saga from 4 steps to 3.

Database write throughput during peak order placement (10K/sec) requires PostgreSQL optimizations: batch inserts (group 100 orders into a single transaction), connection pooling via PgBouncer (200 connections), and synchronous replication to a single standby (async to others) for durability without excessive write latency. The event store Kafka topic uses 64 partitions to parallelize downstream consumption.

Key Trade-offs

Saga over 2PC: Saga allows independent service scaling and tolerates long-running transactions (fraud checks can take seconds), but requires careful design of compensating transactions and idempotency
Event sourcing for orders: Full audit trail and state reconstruction capability, but increases storage and adds complexity in querying current state (mitigated by materialized views)
30-second inventory snapshot in fulfillment router: Fast routing decisions at the cost of occasional stale inventory — validation before commitment catches most errors
Monthly table partitioning: Manageable partition sizes for archival, but cross-partition queries (e.g., customer order history spanning years) require partition pruning or a separate read model