System Design: Carpooling Platform

Requirements

Functional Requirements:

Drivers post regular commute routes with recurring schedule (e.g., Mon–Fri 8 AM)
Riders search for carpools by matching origin/destination and schedule
System computes route similarity and detour cost for driver-rider matching
In-app communication between matched users before the ride
Rating and trust system: identity verification, trip history, reviews
Cost splitting: platform suggests fair cost share based on distance and fuel

Non-Functional Requirements:

Route similarity matching returns results in under 3 seconds for any search
Support 10 million registered users with 500,000 daily active searches
Match quality: driver detour should not exceed 15% of original route duration
High availability for trip day-of coordination (communication, tracking)
GDPR-compliant: user location data deleted after 90 days

Scale Estimation

10 million users; 500,000 daily searches; 200,000 active carpool offers posted/day. Each offer contains an encoded route polyline (~500 bytes) + schedule metadata = ~1 KB. Total offer storage: 200,000 × 1 KB = 200 MB/day; with 30-day rolling window = ~6 GB active offers. Search load: 500,000 searches/day = ~6 searches/second average; morning peak (7–9 AM) = ~60 searches/second. Route matching is computationally expensive: comparing a query route against 200,000 active offers naively is infeasible; spatial indexing reduces the candidate set to ~1,000 per query.

High-Level Architecture

The carpooling platform is organized around three core workflows: Offer Management (drivers posting routes), Search & Matching (riders finding compatible offers), and Trip Coordination (day-of logistics and communication).

Drivers post offers via the Offer Service, which encodes the route as an H3 corridor (a set of H3 cells at resolution 8 that the route passes through) and stores both the raw route and H3 corridor set in PostgreSQL with PostGIS. The H3 corridor enables fast candidate filtering: a rider's query route is also encoded to H3 cells, and offers sharing ≥2 H3 cells with the query are candidate matches.

The Matching Service takes a set of candidate offers and computes detailed compatibility: pickup/dropoff proximity, schedule overlap, detour percentage (using the routing API to compute driver's original route duration vs. with detour), and seat availability. Ranked results are returned to the rider with estimated cost share.

Core Components

Route Encoding & Similarity

Route similarity is computed in two phases. Phase 1 (fast filter): both routes are encoded to H3 cells at resolution 8 (cell diameter ~0.9 km). Offers with <2 overlapping cells are discarded — they are too geographically different to be compatible. Phase 2 (precise detour calculation): for each candidate, the routing API computes the driver's original route time vs. the time with detour to pickup/drop the rider. If detour_ratio = (detour_time / original_time - 1) > 0.15 (15%), the offer is filtered out. The remaining offers are ranked by composite score: detour_ratio (40%), schedule compatibility (30%), pickup walking distance (20%), driver rating (10%).

Schedule Matching Service

Carpooling is schedule-driven — driver and rider must agree on time. Each offer has a cron-like schedule (days of week + departure time ± 15 min flexibility window). The Schedule Matcher evaluates rider query time against offer schedules, considering time zone, recurrence, and one-time vs. recurring trips. Schedule matches are cached per (origin_h3, destination_h3, day_of_week) combination with a 1-hour TTL to accelerate repeated similar queries.

Trust & Verification Service

Carpooling requires higher trust than anonymous ride-hailing. The Trust Service manages: phone verification (Twilio OTP), email verification, government ID verification (Stripe Identity or Onfido API), LinkedIn profile linking (optional), and trip history. Trust scores gate access to premium features: new users can request seats; drivers with 10+ trips can accept unverified riders. All verification results are stored in PostgreSQL with encryption at rest; the trust score is a weighted formula updated asynchronously.

Database Design

Offers in PostgreSQL with PostGIS: (offer_id, driver_id, origin_geom, destination_geom, route_polyline, h3_cells ARRAY, departure_time, recurrence, seats_available, price_per_seat, status). A GIN index on h3_cells enables fast overlap queries (h3_cells && query_h3_cells). Bookings in PostgreSQL: (booking_id, offer_id, rider_id, pickup_point, dropoff_point, status, created_at). Messages in Cassandra: (conversation_id, sender_id, message_text, sent_at) partitioned by conversation_id for fast history retrieval. User profiles and trust scores in PostgreSQL.

API Design

POST /v1/offers — Driver posts route (origin, destination, waypoints, schedule, seats, price); returns offer_id and H3 corridor visualization
POST /v1/search — Rider submits origin, destination, desired_time, seats_needed; returns ranked list of compatible offers with detour info and cost share
POST /v1/offers/{offer_id}/book — Rider requests a seat on an offer; driver receives notification and must ACCEPT/DECLINE within 24 hours
GET /v1/bookings/{booking_id}/chat — WebSocket for in-app messaging between driver and booked rider

Scaling & Bottlenecks

Route matching is the CPU bottleneck. The H3 cell filter reduces candidate sets from 200,000 to ~1,000, and detailed detour computation (routing API call) for 1,000 candidates at 6 searches/second = 6,000 routing API calls/second. This is mitigated by: (1) caching routing results for frequently queried origin-destination pairs in Redis, (2) pre-computing common corridor detours in a background job during off-peak hours, and (3) parallelizing the 1,000 candidate evaluations across a thread pool (100 goroutines per search request).

Search result caching: carpool search results for popular corridor + time combinations (e.g., downtown → tech campus, 8 AM Monday) are cached in Redis for 10 minutes. This dramatically reduces routing API calls for redundant queries during morning peak.

Key Trade-offs

Detour threshold (15% vs. more flexible) — stricter detour limits improve driver experience but reduce match rates; 15% is empirically validated as the acceptance threshold for most drivers
Recurring vs. on-demand carpooling — recurring schedules enable better route pre-computation and trust-building but require commitment; on-demand carpooling is more flexible but harder to match (Uber Pool model)
Open platform vs. enterprise carpooling — enterprise carpooling (employees of same company) has higher trust and schedule predictability; open platforms have larger supply/demand pools but more friction
Cost sharing as incentive vs. profit — regulations in many jurisdictions prohibit paid carpooling above cost recovery; platforms must enforce and track per-ride cost calculations to stay in the non-commercial sharing exemption