System Design: Amazon E-Commerce Platform

Requirements

Functional Requirements:

Users can search and browse a catalog of 350+ million products
Users can add items to cart, apply coupons, and complete checkout
Sellers can list products, manage inventory, and fulfill orders
Personalized product recommendations on every page
Real-time order tracking from placement to delivery
Customer reviews and ratings with verified purchase badges

Non-Functional Requirements:

300M active customers, 50M DAU; peak 50K orders/sec during Prime Day
Product page load under 200ms (p99); checkout flow under 500ms end-to-end
99.999% availability for the checkout path — downtime costs $220K/minute
Strong consistency for inventory and payments; eventual consistency for reviews and recommendations
Support 20+ regional marketplaces with localized pricing and tax rules

Scale Estimation

With 50M DAU browsing an average of 30 pages, the system serves 1.5 billion page views/day or ~17,400 requests/sec. Search queries: 20M DAU searching × 5 queries = 100M searches/day = 1,160 QPS. Orders: 1.5M orders/day average, spiking to 50K/sec during Prime Day. Product catalog: 350M products × 2KB metadata = 700GB base data; with images averaging 5 per product at 200KB each = 350TB of image assets. The recommendation engine processes 500M click events/day for model training.

High-Level Architecture

Amazon's architecture is the canonical example of microservices at scale — reportedly over 1,000 services in production. The customer-facing path flows: CloudFront CDN → Application Load Balancer → API Gateway (authentication, rate limiting, routing) → individual microservices. The Product Catalog Service reads from a distributed document store (DynamoDB) with an Elasticsearch cluster for search. The Cart Service uses an in-memory store (ElastiCache Redis) for active carts with DynamoDB as a durable backing store. The Order Service orchestrates checkout via a saga pattern across Inventory, Payment, and Fulfillment services.

The seller-facing path uses a separate API Gateway routing to Seller Central services. Inventory updates from sellers flow through an SQS queue to the Inventory Service, which maintains stock counts in a DynamoDB table with conditional writes to prevent overselling. Price changes propagate through an SNS topic to downstream consumers including the Search Index, Recommendation Service, and the Buy Box algorithm.

A separate analytics plane powered by Kinesis Data Streams ingests all clickstream data into S3 data lakes, feeding Spark-based ML pipelines for recommendation model training. The trained models are deployed to SageMaker endpoints serving real-time inference for the Recommendation Service.

Core Components

Product Catalog Service

The catalog stores 350M+ products in DynamoDB with product_id as the partition key. Each item contains title, description, bullet points, category path, seller_id, and a JSONB attributes field for category-specific attributes (e.g., screen size for electronics). The catalog supports multi-tenant access: Amazon retail and third-party sellers write to the same store. An Elasticsearch cluster (100+ nodes) indexes product data for full-text search with field boosting on title (3x), brand (2x), and description (1x). Search results are re-ranked by a buy-box-aware algorithm factoring price, seller rating, and fulfillment method.

Checkout & Payment Service

Checkout is implemented as a distributed saga spanning 6 services: (1) Cart Validation — verify all items are still available and prices haven't changed; (2) Address Service — validate shipping address and calculate shipping options; (3) Tax Service — compute taxes based on nexus rules; (4) Inventory Reservation — place a soft hold using DynamoDB conditional writes with a 10-minute TTL; (5) Payment Authorization — tokenized card authorization via a PCI-compliant payment gateway; (6) Order Creation — write the confirmed order. If any step fails, compensating transactions roll back prior steps. The saga coordinator uses a Step Functions-style state machine persisted in DynamoDB.

Recommendation Engine

Amazon's recommendation engine uses a hybrid approach combining collaborative filtering (item-to-item via matrix factorization) and content-based signals (product attribute embeddings). The offline pipeline runs on Spark, processing 500M daily click events to retrain models every 4 hours. Online serving uses a two-stage pipeline: a fast retrieval layer using ANN search over item embeddings (FAISS on GPU instances) retrieves 500 candidates, then a ranking model (gradient-boosted trees on features like purchase history, browsing context, price sensitivity) selects the top 20. Results are cached per user in Redis with a 15-minute TTL.

Database Design

The product catalog uses DynamoDB with a GSI on category_id for browse-tree navigation. The Orders table uses order_id as the partition key with a GSI on customer_id + created_at for order history queries. Inventory is stored in a DynamoDB table with product_id + fulfillment_center_id as the composite key; stock counts use atomic counters with conditional updates (decrement only if count >= requested quantity).

For the search index, Elasticsearch stores a denormalized product document including seller info, pricing, and availability. The index is updated via a CDC pipeline from DynamoDB Streams → Lambda → Elasticsearch, with average indexing lag under 5 seconds. Customer data (profiles, addresses, payment methods) resides in a separate encrypted RDS PostgreSQL cluster with row-level encryption for PII fields.

API Design

GET /api/v1/products/search?q={query}&category={id}&sort=relevance&page=1&size=20 — Search products with faceted filtering; returns ranked results with Buy Box winner
POST /api/v1/cart/items — Add item to cart; body contains product_id, quantity, seller_id; returns updated cart with price breakdown
POST /api/v1/orders/checkout — Initiate checkout saga; body contains cart_id, shipping_address_id, payment_method_id; returns order_id and saga status
GET /api/v1/orders/{order_id}/tracking — Real-time order tracking with fulfillment status and estimated delivery

Scaling & Bottlenecks

Prime Day scaling is Amazon's defining challenge — traffic increases 10x over baseline. The system uses pre-warming: auto-scaling groups are scaled up 2 hours before the event, DynamoDB tables switch to on-demand capacity mode, and CDN cache is pre-populated with deal page assets. The checkout path uses a separate reserved capacity pool isolated from browse traffic to ensure orders complete even under extreme load.

The inventory hot-partition problem occurs when a viral product concentrates all writes on a single DynamoDB partition key. Amazon mitigates this with write sharding: the inventory count for hot items is split across N shard keys (e.g., product_123_shard_0 through product_123_shard_9), with reads aggregating across all shards. This spreads write throughput across partitions at the cost of more expensive reads.

Key Trade-offs

DynamoDB over relational DB for catalog: Schemaless design handles 350M products with heterogeneous attributes, but sacrifices complex query capability — Elasticsearch fills that gap
Saga over 2PC for checkout: Saga with compensating transactions allows each service to scale independently and tolerate partial failures, but requires idempotent operations and careful failure handling
Write sharding for hot inventory items: Distributing counts across shards prevents hot partition throttling during flash sales, but aggregation on reads adds latency
4-hour recommendation model refresh: Balances freshness with compute cost; real-time signals (current session clicks) are injected at serving time to compensate for model staleness

System Design: Amazon E-Commerce Platform

Requirements

Scale Estimation

High-Level Architecture

Core Components

Product Catalog Service

Checkout & Payment Service

Recommendation Engine

Database Design

API Design

Scaling & Bottlenecks

Key Trade-offs

Master this topic in our 12-week cohort

System Design: Product Search (Amazon-scale)

System Design: Video Transcoding Pipeline

System Design: Shopify (Multi-tenant E-Commerce)

System Design: Product Catalog Service

System Design: Shopping Cart System

System Design: Order Management System