SYSTEM_DESIGN
System Design: Zero Trust Network Architecture
Design a Zero Trust network architecture that eliminates implicit trust, enforces continuous authentication and authorization for every request, and provides micro-segmentation across on-premises and cloud environments. Covers identity-based access, mTLS, policy engines, and device attestation.
Requirements
Functional Requirements:
- Authenticate and authorize every request between services using cryptographic identity (mTLS certificates)
- Enforce continuous authentication for users: re-verify identity and device health on every sensitive action
- Implement micro-segmentation: services can only communicate with explicitly permitted peers
- Provide a policy engine that evaluates access requests based on identity, device posture, and context
- Support hybrid environments: enforce Zero Trust policies across on-premises servers, cloud VMs, and Kubernetes pods
- Integrate with existing LDAP/AD directories and OIDC providers for user identity
Non-Functional Requirements:
- Policy evaluation adds less than 2ms to service-to-service request latency
- mTLS handshake overhead under 10ms for new connections; 0ms for resumed TLS sessions
- Support 50,000 service instances with unique certificates managed automatically via a service mesh
- Certificate rotation every 24 hours without service downtime
- 99.999% availability for the policy enforcement plane
Scale Estimation
50,000 service instances each making 1,000 inter-service calls/second = 50 million service-to-service requests/second. mTLS session resumption (TLS session tickets) avoids full handshake for 99% of requests; only new connections (connection pool miss) trigger a full handshake. At 1% new connections: 500,000 full mTLS handshakes/second distributed across the network. Policy evaluation: 50 million authorization decisions/second — must be served from local cache, not centralized policy engine.
High-Level Architecture
Zero Trust Network Architecture rests on three principles: verify explicitly (every request is authenticated and authorized regardless of network location), use least privilege access (minimum required access per request), and assume breach (design as if the network is already compromised, laterally segment everything). The implementation has three planes: the Control Plane (policy management and certificate authority), the Data Plane (sidecar proxies enforcing policy), and the Identity Plane (user and workload identity).
Service identity is established through SPIFFE/SPIRE (Secure Production Identity Framework For Everyone). SPIRE issues short-lived X.509-SVID (SVID = SPIFFE Verifiable Identity Document) certificates to every workload based on its platform attestation (Kubernetes pod labels, EC2 instance metadata). These certificates are automatically rotated every 24 hours. Istio or Envoy service mesh sidecars terminate and initiate mTLS for all inter-service communication, using the SPIFFE SVIDs to authenticate each peer.
Policy enforcement uses Open Policy Agent (OPA) sidecar instances deployed alongside each service. When a sidecar receives a request, it calls its local OPA instance (in-process HTTP call, <0.5ms) to evaluate the authorization policy: inputs are the caller's SPIFFE identity, the target service and endpoint, the request HTTP method, and context attributes (time of day, request header values). OPA evaluates Rego policies and returns allow/deny. Policies are distributed from a central Policy Administration Point (PAP) to all OPA instances via a push mechanism (rego bundle download every 30 seconds).
Core Components
SPIRE Workload Attestation
SPIRE Server is the root of trust for workload identity. SPIRE Agents run on every node and attest workloads using platform-specific attestors: Kubernetes pod UID + service account (for K8s), EC2 instance identity document + IAM role (for AWS VMs), or TPM attestation for bare-metal. Upon successful attestation, SPIRE issues a short-lived (24-hour) X.509 certificate with the SPIFFE ID spiffe://{trust_domain}/ns/{namespace}/sa/{service_account} encoded in the Subject Alternative Name. Certificate rotation is handled automatically by the SPIRE Agent, which fetches a new certificate 1 hour before expiry using the existing certificate as proof of identity.
Envoy/Istio Sidecar Proxy
Every service pod has an Envoy sidecar injected (via Kubernetes admission webhook). Envoy intercepts all inbound and outbound traffic. For outbound requests, Envoy initiates mTLS using the SPIFFE SVID, verifying the server's certificate chain against the SPIRE CA. For inbound requests, Envoy validates the caller's SPIFFE certificate and extracts the SPIFFE ID as the peer identity. This identity is forwarded to the application in an X-Forwarded-Client-Cert header and to OPA for authorization decisions. TLS session resumption is enabled (session tickets, 1-hour TTL) to eliminate handshake overhead for ongoing connections.
Policy Administration Point (PAP)
The PAP stores authorization policies as Rego (OPA's policy language). Policies define: which SPIFFE identities can call which services (allow { input.caller.spiffe_id == "spiffe://corp/ns/frontend/sa/web-server"}), which HTTP methods and paths are permitted, and context-based restrictions (e.g., payment service can only be called from business hours on corporate-network-originated requests). Policy changes go through a GitOps review process (PR + approval) before being pushed to the policy bundle server. OPA instances pull new bundles every 30 seconds, with a 1-minute worst-case propagation delay.
Database Design
SPIRE Server uses an embedded SQLite or external PostgreSQL for registration entries: registration_entries (entry_id, spiffe_id, parent_id, selectors JSON, ttl_seconds, federates_with TEXT[]), nodes (node_id, attestation_type, cert_serial, expiry). Policy bundles are stored as versioned files in S3 (or Git), with a CDN layer for fast global distribution to OPA agents. Audit log of authorization decisions (allow/deny) is written to Kafka: (ts, caller_spiffe_id, target_spiffe_id, method, path, decision, policy_version, latency_us).
API Design
POST /policies — Submit a new Rego policy bundle; validated syntactically and semantically before activation.
GET /policies/evaluate — Dry-run policy evaluation for a hypothetical request (for debugging and testing).
GET /workloads/{spiffe_id}/certificate — Return the current certificate and expiry for a registered workload.
GET /audit/access?caller={spiffe_id}&from={ts}&to={ts} — Return authorization decision history for a workload.
Scaling & Bottlenecks
Centralized policy evaluation at 50 million decisions/second is impossible; OPA must run as a local sidecar making in-process decisions. Policy bundle size grows with organizational complexity; large Rego policies (10,000+ lines) can take 50ms to evaluate. Partial evaluation (pre-compiling policies against known static inputs like the target service identity) reduces evaluation time from 50ms to <1ms by pre-computing the allow set for each service at bundle load time.
SPIRE Server becomes a bottleneck at certificate rotation scale: 50,000 workloads rotating every 24 hours = 50,000 certificate issuances/day = 0.58/second — negligible. However, SPIRE Server is a single point of failure for new workload attestation; HA deployment (3 SPIRE Server instances with shared PostgreSQL) and regional SPIRE deployments (each region has its own SPIRE cluster, federated with the root trust domain) eliminate single-region failure risk.
Key Trade-offs
- SPIFFE mTLS vs. API key/JWT auth: mTLS provides cryptographic workload identity with automatic certificate lifecycle management; API keys/JWTs are easier to implement but harder to rotate automatically and lack workload attestation.
- Sidecar proxy vs. in-process SDK: Sidecar proxies enforce Zero Trust transparently without application code changes but add 1–2ms per request for the proxy hop; in-process SDKs have lower overhead but require adoption by every service.
- Policy-as-code vs. UI-managed rules: Policy-as-code (Rego in Git) provides audit history, code review, and automation-friendly management; UI-managed rules are more accessible for non-engineers but harder to audit and version.
- Strict micro-segmentation vs. network overlays: Strict micro-segmentation (deny all inter-service communication by default, whitelist per pair) minimizes blast radius but requires policy for every service-to-service relationship; network overlay VPNs (WireGuard) establish trusted zones but re-introduce implicit network trust.
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.