System Design: Lyft

Requirements

Functional Requirements:

Riders can request rides with pickup and destination; system matches a nearby driver
Drivers receive trip offers and can accept or decline within a timeout window
Live map shows driver approach and trip progress for both parties
Fare is calculated using base rate, per-mile, per-minute, and surge multiplier
Riders can schedule rides up to 7 days in advance
Both parties can cancel with configurable cancellation fee logic

Non-Functional Requirements:

Match a rider to a driver in under 3 seconds at the 95th percentile
99.95% uptime with active-active multi-region deployment
Handle 500,000 simultaneous ride requests during peak hours
Location update pipeline processes 800,000 GPS events per second
Payment processing complies with PCI-DSS Level 1

Scale Estimation

Lyft handles roughly 2 million rides per day in the US market. Peak demand occurs on Friday/Saturday evenings where requests spike to ~10,000/minute. With ~500,000 active drivers, each pinging location every 5 seconds, the inbound rate is 100,000 GPS events/second. Storage: trip records at 2 KB each generate ~4 GB/day; location telemetry at 50 bytes/ping generates ~430 GB/day before compression (typically 10:1 with LZ4), settling at ~43 GB/day persisted.

High-Level Architecture

Lyft's platform is split into three logical planes: the Supply Plane (driver location tracking and availability), the Demand Plane (rider requests and fare estimation), and the Matching Plane (joining supply to demand with minimum ETA). All three communicate asynchronously through Kafka topics and synchronously via gRPC for latency-sensitive paths.

The Supply Plane runs a Location Service that ingests GPS pings from driver apps via a long-lived gRPC streaming connection. Pings are written to a Redis geospatial index (using Lyft's open-source Ridelevel geofencing layer on top of S2 geometry) and also fanned out on a Kafka topic for analytics, surge calculation, and ETL pipelines. Driver availability state (online/offline/on-trip) is maintained in a distributed cache with TTL-based expiry — if a driver stops pinging for 60 seconds they are automatically marked offline.

When a ride request arrives, the Matching Service queries the supply index for the nearest N available drivers, runs parallel ETA calculations against a pre-warmed OSRM routing cluster, ranks candidates, and dispatches the top match via a push notification through APNs/FCM. The entire match cycle targets under 1 second of server-side processing.

Core Components

Location Ingestion Service

Built on a fleet of gRPC servers behind an NLB (Network Load Balancer), the Location Ingestion Service receives ~100,000 pings/second at peak. Pings are validated, de-duplicated (using driver_id + sequence_number), written to Redis GEO sorted sets, and forwarded to Kafka. The service is stateless and auto-scales via Kubernetes HPA based on CPU and incoming connection count. A circuit breaker prevents Redis overload from cascading into the gRPC layer.

Matching & Dispatch Service

The Matching Service implements a two-phase dispatch: first a coarse geospatial lookup (S2 cells at level 13, ~1 km²) narrows drivers to a candidate pool of 20–50, then a fine-grained ETA computation (parallelized across 4 routing workers per request) produces a ranked list. The final offer is sent to the top driver; a fallback timer of 8 seconds triggers re-dispatch to the next candidate. All state is persisted in a Cassandra trip table, making the Matching Service stateless and restartable.

Surge Pricing Engine

Every 60 seconds, a Surge Calculator job reads demand (open requests) and supply (available drivers) counts per geofenced zone from Kafka aggregations. It computes a multiplier using a piecewise linear function capped at 3x by policy. Multipliers are published to a Redis hash that the Fare Service reads on every new ride request. A smoothing algorithm prevents sudden multiplier jumps that frustrate riders.

Database Design

Trip records are stored in Cassandra with partition key trip_id and clustering keys on rider_id and driver_id for efficient history scans. A separate PostgreSQL (Aurora) cluster stores user accounts, payment methods, and promo codes where ACID guarantees are needed. Driver earnings and payout schedules use a double-entry ledger table in Aurora to prevent accounting errors. Ride location history is archived to S3 in Parquet format daily for GDPR-compliant deletion and analytics via Athena.

API Design

POST /v1/rideRequests — Accepts pickup/dropoff coordinates and ride type (Standard, XL, Lux); returns request_id, fare estimate, and driver ETA
GET /v1/rideRequests/{id}/status — Long-poll endpoint returning current FSM state, driver location, and updated ETA; supports SSE for streaming updates
POST /v1/drivers/heartbeat — Driver SDK reports lat/lng, heading, speed, and availability status every 5 seconds
DELETE /v1/rideRequests/{id} — Cancels an active request; returns applicable cancellation fee

Scaling & Bottlenecks

The Redis geospatial layer handles the highest write throughput. Lyft shards it by metropolitan area (each city cluster owns its own Redis primary), so no cross-region coordination is needed for matching. Read replicas serve ETA fan-out queries to keep latency under 2ms. The matching service scales horizontally; Kubernetes auto-scales from 20 to 200 pods during surge events using custom Kafka consumer-lag metrics.

Scheduled rides introduce a different scaling challenge: a batch scheduler must wake up and trigger matching ~10 minutes before pickup, which can create bursty load. This is handled by a priority queue in SQS with dedicated worker pools for scheduled vs. on-demand requests, preventing scheduled-ride processing from interfering with real-time dispatch latency.

Key Trade-offs

S2 geometry vs. geohash — S2 cells have equal area at each level (better matching fairness) but are computationally more expensive to intersect; Lyft accepts the CPU cost for match quality
Stateless matching vs. stateful reservations — stateless matching simplifies scaling but requires Cassandra reads on every dispatch cycle; a read cache (Redis) of active trips mitigates this
Aggressive pre-computation of ETAs — caching ETAs for popular driver-to-pickup corridors reduces latency but can serve stale estimates during accidents or road closures; a staleness TTL of 30 seconds balances freshness and cost
APNs/FCM vs. WebSocket for driver offers — push notifications are fire-and-forget (low server state) but have variable delivery latency; Lyft maintains a fallback WebSocket for drivers in areas with unreliable connectivity