SYSTEM_DESIGN
System Design: Service Mesh
Design a service mesh like Istio or Linkerd that provides zero-trust mTLS, intelligent traffic management, observability, and resilience patterns for microservices without application code changes.
Requirements
Functional Requirements:
- Transparently encrypt all service-to-service communication with mTLS without code changes
- Implement traffic management: load balancing, canary releases, traffic splitting, circuit breaking
- Collect distributed traces, metrics (RED: rate/error/duration), and access logs for all service traffic
- Enforce authorization policies: service A can call service B's /api endpoint but not /admin
- Support traffic mirroring (shadow traffic to canary for testing without affecting production)
- Provide service discovery and health-based routing
Non-Functional Requirements:
- Proxy latency overhead under 1ms p99 per hop (sidecar must not significantly impact service latency)
- Support 10,000+ service instances across 100+ services in a single mesh
- Certificate rotation for mTLS with zero downtime (SPIFFE/SPIRE standard)
- Control plane updates propagate to all proxies within 5 seconds
Scale Estimation
10,000 sidecar proxies, each handling 10,000 RPS = 100M RPS total mesh traffic. Each proxy maintains upstream connection pools to ~20 services (average fan-out). Control plane must push xDS configuration updates to 10,000 proxies — each update is ~10 KB, so a single flag change triggers 100 MB of data transfer. Telemetry: each proxy emits ~500 metrics/sec; 10,000 proxies = 5M metrics/sec into the observability pipeline. mTLS certificate rotation: 10,000 certs, rotated every 24 hours = ~7 cert rotations/sec average load on the CA.
High-Level Architecture
The service mesh has two planes. The data plane consists of lightweight proxy sidecars (Envoy is the de facto standard) injected into every pod via a mutating admission webhook in Kubernetes. The sidecar intercepts all inbound and outbound traffic using iptables rules, transparent to the application. The control plane (Istiod in Istio, the destination controller in Linkerd) configures all sidecars via the xDS (discovery service) API — a gRPC streaming protocol where proxies maintain long-lived connections to the control plane and receive configuration delta updates.
The xDS API has several sub-services: LDS (listener discovery) configures inbound/outbound listeners; RDS (route discovery) configures HTTP routing rules; CDS (cluster discovery) configures upstream service clusters; EDS (endpoint discovery) configures the actual IP:port endpoints within each cluster. Proxies apply these configurations dynamically — a canary traffic split from 0% to 10% is a CDS/RDS update that propagates to all proxies within seconds.
Certificate management follows the SPIFFE standard: each service gets a cryptographic identity encoded in an X.509 certificate (SVID) with a URI SAN like spiffe://cluster.local/ns/default/sa/my-service. The mesh CA (built into the control plane) issues short-lived certs (24h TTL) rotated automatically. mTLS handshakes use these certs — both sides verify the peer's SPIFFE identity, enabling cryptographic service identity without IP-based trust.
Core Components
Envoy Sidecar Proxy
Envoy is a high-performance C++ proxy with a plugin architecture (filters). For inbound traffic: the sidecar terminates mTLS, extracts the peer SPIFFE identity, evaluates authorization policy, collects telemetry, and forwards to localhost (the application). For outbound traffic: the sidecar intercepts connections to other services (via iptables REDIRECT), performs DNS-based service resolution, selects an endpoint via load balancing (least request, round-robin, or Maglev consistent hashing for session affinity), establishes mTLS to the destination sidecar, and forwards. The entire path adds <0.5ms when traffic stays local to a node.
Control Plane (Istiod)
Istiod aggregates three functions: the xDS server (configures all Envoy sidecars), the certificate authority (issues SPIFFE SVIDs), and the service registry (watches Kubernetes API server for pod/service changes and translates them into xDS configuration). When a new pod starts, the admission webhook injects the sidecar, the cert controller issues a cert (using a CSR from the sidecar's init container), and Istiod pushes xDS updates to nearby proxies that have this service as an upstream. Istiod itself runs as a multi-replica deployment with leader election for the CA function.
Traffic Management Policies
VirtualService and DestinationRule (Istio's CRDs) define traffic behavior. A VirtualService specifies routing rules: send 90% of traffic to subset v1, 10% to subset v2; retry on 5xx up to 3 times; timeout at 10 seconds. A DestinationRule specifies cluster configuration: load balancing algorithm, connection pool sizes, outlier detection (circuit breaking). These policies are compiled by Istiod into xDS config and pushed to relevant proxies. Authorization policies (AuthorizationPolicy CRD) define RBAC rules: source.principal == spiffe://.../productpage is required to call destination.service == reviews:9080.
Database Design
The control plane is largely stateless — Kubernetes is the database. All configuration (VirtualServices, DestinationRules, AuthorizationPolicies) is stored as Kubernetes CRDs. Istiod watches these resources via the Kubernetes watch API (long-poll). Service endpoints come from Kubernetes Endpoints/EndpointSlice objects. The only persistent state is the CA's signing key, stored as a Kubernetes Secret (or in an external KMS/HSM for production).
Telemetry data (traces, metrics, logs) flows out of Envoy via: Prometheus scraping for metrics, OpenTelemetry/Zipkin for traces, and stdout JSON for logs collected by the node log agent. A separate observability stack (Prometheus + Grafana, Jaeger/Tempo, Loki) stores and queries this data — the mesh itself has no persistence for telemetry.
API Design
Scaling & Bottlenecks
The xDS server bottlenecks when all 10,000 proxies simultaneously reconnect (e.g., after a control plane restart). Each reconnect requires sending the full config state — a thundering herd problem. Mitigation: exponential backoff with jitter on reconnect; delta xDS (only sending changed resources, not full state on reconnect); sharding the xDS server by namespace or service group so no single instance serves all proxies.
Certificate issuance scales linearly with sidecar count. During cluster scale-out (1,000 new pods starting simultaneously), the CA must issue 1,000 certs in seconds. The CA is CPU-bound (RSA/ECDSA signing). Mitigation: use ECDSA P-256 (faster than RSA-2048), batch CSR processing, and scale the CA horizontally with multiple replicas sharing a signing key stored in a KMS.
Key Trade-offs
- Sidecar vs. node-level proxy (ambient mesh): Sidecars provide strong workload isolation but consume significant memory (~50 MB per pod); ambient mesh (Istio's newer model) uses a per-node proxy, reducing overhead but weakening isolation boundaries
- mTLS STRICT vs. PERMISSIVE: STRICT mode enforces mTLS for all traffic (zero-trust) but breaks legacy services that don't support mTLS; PERMISSIVE accepts both mTLS and plaintext during migration
- L4 vs. L7 policies: L4 policies (IP/port) are simple and fast; L7 policies (HTTP path, headers) enable fine-grained control but require HTTP parsing overhead in the sidecar
- Control plane availability: If Istiod is down, data plane continues with existing config (proxies cache their last-known-good state) but no new config changes apply — operations proceed but deploys are blocked
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.