INTERVIEW_QUESTIONS

gRPC Interview Questions for Senior Engineers (2026)

Top gRPC interview questions with detailed answer frameworks covering protocol buffers, streaming patterns, service mesh integration, performance optimization, and production patterns used at Google, Netflix, and Uber.

20 min readUpdated Apr 20, 2026
interview-questionsgrpcsenior-engineerdistributed-systems

Why gRPC Matters in Senior Engineering Interviews

gRPC has become the backbone of internal service communication at companies operating large-scale microservice architectures. Originally developed at Google as the successor to Stubby (their internal RPC framework that handles billions of calls per second), gRPC is now the standard for high-performance, type-safe inter-service communication at companies like Google, Netflix, Uber, Square, Dropbox, and Cloudflare. Senior engineering interviews increasingly expect fluency in gRPC design patterns, performance characteristics, and operational concerns.

Unlike REST, which was designed for resource-oriented public APIs over HTTP/1.1, gRPC is purpose-built for efficient service-to-service communication. It uses HTTP/2 for multiplexed connections, Protocol Buffers for compact binary serialization, and code generation for type-safe client/server stubs. These choices yield 5-10x throughput improvements over JSON/HTTP APIs, making gRPC essential for latency-sensitive microservice architectures.

At the senior level, interviewers expect you to reason about gRPC's protocol mechanics, understand when gRPC is appropriate versus alternatives, design service APIs using Protocol Buffers with forward/backward compatibility, implement streaming patterns for real-time data flows, and operate gRPC services in production with proper observability, load balancing, and error handling. For context on how gRPC compares to other API paradigms, see our REST vs gRPC comparison and GraphQL vs gRPC analysis. To understand gRPC's internal mechanics, explore our how gRPC works guide.

1. Explain how gRPC uses HTTP/2 and why this matters for performance.

What the interviewer is really asking: Do you understand the protocol-level advantages of gRPC beyond just binary serialization, and can you explain the specific HTTP/2 features that enable gRPC's performance characteristics?

Answer framework:

gRPC is built on HTTP/2, and this choice is fundamental to its design rather than incidental. HTTP/2 provides four critical features that gRPC exploits.

First, multiplexing. HTTP/1.1 suffers from head-of-line blocking: on a single TCP connection, requests are serialized, and a slow response blocks all subsequent responses. HTTP/2 eliminates this by multiplexing multiple streams over a single TCP connection. Each gRPC call is a separate HTTP/2 stream that can transmit concurrently without blocking. This means a single TCP connection between two services can carry thousands of concurrent gRPC calls, drastically reducing connection overhead (TCP handshake, TLS negotiation) compared to HTTP/1.1 where you need a connection pool.

Second, binary framing. HTTP/2 uses a binary frame format rather than text-based headers. Frames have well-defined lengths and types, enabling efficient parsing. Combined with Protocol Buffers' binary encoding, the entire gRPC payload pipeline is binary from serialization through transport, eliminating the overhead of text parsing.

Third, header compression via HPACK. gRPC calls carry metadata (method name, content-type, authority, custom headers) that is compressed using HPACK. Since consecutive calls to the same service share most headers, HPACK's dynamic table achieves very high compression ratios for the header portion. This matters especially for high-frequency small RPCs where headers could otherwise dominate payload size.

Fourth, flow control. HTTP/2 provides per-stream and per-connection flow control, enabling back-pressure propagation in streaming RPCs. When a receiver cannot process data fast enough, it stops acknowledging window updates, which causes the sender to pause. This prevents unbounded memory growth in streaming scenarios without application-level flow control implementation.

Additionally, HTTP/2's server push capability, while not directly used by gRPC, and the ability to send trailers (metadata after the response body) enables gRPC's status code and error detail delivery. gRPC sends the status code in trailers, allowing the server to make its final determination after producing the response body.

The practical implication: a single long-lived HTTP/2 connection between services can replace hundreds of HTTP/1.1 connections while achieving lower latency (no connection setup per call) and higher throughput (no head-of-line blocking). This is why gRPC services typically maintain just one or a few connections per peer rather than large connection pools.

Follow-up questions:

  • What is the impact of HTTP/2's TCP head-of-line blocking versus HTTP/1.1's, and how does HTTP/3 (QUIC) address this?
  • How does gRPC handle connection-level errors versus stream-level errors?
  • What are the implications of HTTP/2 multiplexing for load balancer design?

2. How do you design Protocol Buffer schemas for forward and backward compatibility?

What the interviewer is really asking: Do you understand the rules for evolving protobuf schemas without breaking existing clients and servers, and have you dealt with the practical challenges of schema evolution in production?

Answer framework:

Protocol Buffers achieve compatibility through field numbering and wire-type encoding. Each field has a permanent number that identifies it on the wire. The schema describes how to interpret bytes, but the wire format is self-describing enough that unknown fields are preserved rather than rejected.

Backward compatibility (new code reads old data) rules: never reuse a field number after removing a field (use reserved to prevent this). Never change a field's wire type (int32, int64, sint32 use different encodings). Adding new fields is always backward-compatible because old data simply lacks those fields, and they take their default value.

Forward compatibility (old code reads new data) rules: old code ignores unknown field numbers. This means adding fields is forward-compatible. Removing required fields (proto2) or non-optional fields that old code depends on breaks forward compatibility.

Practical rules for safe schema evolution: (1) Never change field numbers. (2) Never change field types to incompatible wire types. (3) Adding fields: always safe. (4) Removing fields: mark as reserved (both name and number) to prevent reuse. (5) Renaming fields: safe for binary compatibility (wire format uses numbers, not names), but breaks JSON serialization (which uses names). (6) Changing between optional and repeated: technically compatible on the wire but semantically different. (7) Enums: add new values freely, but old code will see them as unknown values (behavior differs by language).

Discuss the oneof pitfall: adding a field to an existing oneof is safe, but moving an existing field into a oneof is a breaking change. Also, adding a required field (proto2) is always a breaking change for forward compatibility.

For production schema governance: maintain a protobuf schema registry (like Buf Schema Registry). Run buf breaking in CI to detect breaking changes automatically. Use API versioning (package myservice.v1, myservice.v2) for major incompatible changes rather than evolving the same package. Document field semantics in comments since the field name alone is often ambiguous.

Discuss the proto3 simplification: all fields are optional by default (no required keyword), unknown fields are preserved, and default values are always the zero value. This makes proto3 more compatible by default but loses the ability to distinguish "field not set" from "field set to default." The optional keyword in proto3 (reintroduced) restores presence semantics when needed.

Follow-up questions:

  • How do you handle a field that needs to change from a scalar to a message type?
  • What is your strategy for deprecating fields across a large organization?
  • How do you version APIs when semantic meaning changes but the wire format does not?

3. Compare gRPC's four streaming patterns and explain when to use each.

What the interviewer is really asking: Do you understand unary, server-streaming, client-streaming, and bidirectional streaming at a practical level, and can you choose the right pattern for real-world problems?

Answer framework:

The four gRPC communication patterns correspond to the four combinations of single/stream for request and response.

Unary RPC (single request, single response): the most common pattern. Client sends one message, server processes it, server sends one response. Use for: simple CRUD operations, authentication, any request-response interaction that completes quickly. Example: GetUser(UserId) returns User. This is equivalent to a traditional REST API call. Most gRPC services are predominantly unary.

Server-streaming RPC (single request, stream of responses): client sends one request, server responds with a stream of messages. The stream ends when the server sends a status. Use for: downloading large datasets in chunks, real-time event feeds (subscribe to updates), long-running queries that produce results incrementally. Example: ListTransactions(AccountId, DateRange) streams Transaction messages. Advantages over unary with repeated fields: start processing before the full response is ready (time-to-first-byte), memory-efficient for large result sets, natural backpressure.

Client-streaming RPC (stream of requests, single response): client sends a stream of messages, server waits until the stream completes then sends a single response. Use for: file upload in chunks, aggregation (send many data points, receive a summary), batch ingestion where the response summarizes the batch. Example: UploadFile(stream FileChunk) returns UploadResult. The server might process chunks as they arrive or wait for the complete stream.

Bidirectional streaming RPC (stream of requests, stream of responses): both client and server send streams of messages independently. Neither side needs to wait for the other. Use for: chat applications, multiplayer game state synchronization, real-time collaborative editing, interactive sessions where both sides produce data. Example: Chat(stream ChatMessage) returns stream ChatMessage. The two streams are independent: either side can send at any time, and the order of sends and receives is not coupled.

Implementation considerations: streaming RPCs maintain a single HTTP/2 stream for the duration. Long-lived streams interact with load balancing (connection-level LB cannot rebalance mid-stream), timeouts (should you timeout the whole stream or individual messages?), and error handling (a mid-stream error terminates the entire stream). For bidirectional streams, implement heartbeats to detect dead connections. For understanding how these patterns fit into broader system architecture, see our system design interview guide and how gRPC works.

Follow-up questions:

  • How do you handle errors in the middle of a server-streaming response?
  • What is the interaction between streaming RPCs and deadlines/timeouts?
  • How do you implement pagination as an alternative to server-streaming for large result sets?

4. How does gRPC handle load balancing, and what are the differences between client-side and proxy-based approaches?

What the interviewer is really asking: Do you understand why HTTP/2 multiplexing complicates traditional load balancing, and can you design a load balancing strategy for gRPC services in production?

Answer framework:

Traditional L4 (TCP-level) load balancers distribute connections across backends. With HTTP/1.1, each connection carries one request at a time, so distributing connections effectively distributes load. With gRPC on HTTP/2, a single connection carries thousands of concurrent RPCs. An L4 load balancer assigns the connection to one backend, and all subsequent RPCs go to that same backend. This makes L4 load balancing ineffective for gRPC.

Solution 1: L7 (application-level) proxy-based load balancing. A proxy (Envoy, Nginx with gRPC support, or a cloud load balancer like GCP's gRPC-aware LB) terminates the HTTP/2 connection from the client, inspects each gRPC call (reading the path header to determine the service/method), and routes each individual RPC to a potentially different backend. The proxy maintains its own HTTP/2 connections to backends. Advantages: simple client configuration (clients connect to one address), centralized policy enforcement (rate limiting, retries). Disadvantages: added latency hop, proxy becomes a potential bottleneck, and the proxy must handle all the traffic.

Solution 2: Client-side load balancing. The client maintains connections to multiple backends and distributes RPCs directly. The client discovers backends through a service registry (DNS, Consul, etcd, Kubernetes endpoints) and applies a load balancing algorithm (round-robin, least-connections, weighted). Libraries like grpc-go and grpc-java have built-in support. The xDS protocol (from Envoy) standardizes client-side LB configuration. Advantages: no extra hop, highest performance, no proxy bottleneck. Disadvantages: complex client configuration, load balancing logic in every client, harder to enforce centralized policies.

Solution 3: Service mesh (sidecar proxy). A sidecar proxy (Envoy in Istio/Linkerd) runs alongside each service. The application connects to localhost, the sidecar handles service discovery and load balancing. This combines the simplicity of proxy-based LB (applications do not implement LB) with the performance benefits of decentralization (no central proxy bottleneck). This is the dominant pattern in Kubernetes environments.

Discuss look-aside load balancing: the client consults an external LB service for backend addresses and weights, then connects directly to backends. Google's internal architecture uses this pattern with a global load balancer service.

For streaming RPCs specifically: once a stream is established, it is pinned to one backend for its duration. Long-lived streams can cause load imbalance. Solutions include limiting stream duration and reconnecting, using connection-level (not stream-level) load reporting for rebalancing decisions, and graceful drain signals (GOAWAY frame) to migrate streams during deployments. For more on load balancing in distributed systems, see our API gateway concepts.

Follow-up questions:

  • How do you handle backend health checking in a client-side load balancing setup?
  • What is the impact of connection warm-up time on load distribution after scaling events?
  • How do you implement weighted load balancing for canary deployments with gRPC?

5. How do you implement proper error handling and status codes in gRPC?

What the interviewer is really asking: Do you understand gRPC's error model beyond simple status codes, and can you design error handling that enables efficient debugging without leaking implementation details?

Answer framework:

gRPC defines 16 status codes (OK, CANCELLED, UNKNOWN, INVALID_ARGUMENT, DEADLINE_EXCEEDED, NOT_FOUND, ALREADY_EXISTS, PERMISSION_DENIED, RESOURCE_EXHAUSTED, FAILED_PRECONDITION, ABORTED, OUT_OF_RANGE, UNIMPLEMENTED, INTERNAL, UNAVAILABLE, DATA_LOSS). Each code has specific semantics that determine retry behavior. Clients should retry on UNAVAILABLE (transient connectivity issue), DEADLINE_EXCEEDED (potentially), and ABORTED (transaction conflict). They should not retry on INVALID_ARGUMENT (request is wrong), NOT_FOUND (resource does not exist), or PERMISSION_DENIED (auth will not change).

Beyond status codes, gRPC supports rich error details through the google.rpc.Status proto message and its details field. This carries structured error information: BadRequest (field violations with specific field paths), PreconditionFailure (which precondition failed), QuotaFailure (which quota was exceeded), ErrorInfo (domain-specific error code and metadata), RetryInfo (when to retry), and DebugInfo (stack trace, for internal use only). Use these structured details to provide actionable error information to clients.

Design principles for gRPC error handling. First, choose the most specific status code. Using INTERNAL for everything makes debugging impossible. FAILED_PRECONDITION vs INVALID_ARGUMENT: the former means the request could succeed if system state changes, the latter means the request is inherently invalid. Second, include structured error details for client-actionable errors. If a field validation fails, include BadRequest with the specific field path and violation description. Third, never expose internal details to external clients. Stack traces, database errors, and internal service names go in logs, not error responses. Fourth, propagate context: when service A calls service B and B fails, service A should either translate the error to its own domain or propagate the status code with appropriate wrapping.

Discuss error handling in streaming: a server-streaming RPC can return errors via the trailing status (after sending some messages) or via the initial metadata (before sending any messages). Client code must handle both cases. For bidirectional streams, either side can terminate the stream with an error at any time.

For production observability: log errors with correlation IDs (propagated via gRPC metadata), track error rates by status code per service/method, alert on elevated error rates (especially INTERNAL and UNAVAILABLE which indicate system issues versus client errors like INVALID_ARGUMENT).

Discuss the relationship between gRPC errors and HTTP status codes: when gRPC is accessed through a gateway (like grpc-gateway for REST transcoding), status codes map to HTTP codes (NOT_FOUND maps to 404, INVALID_ARGUMENT to 400). Understanding this mapping matters for APIs that serve both gRPC and REST clients. Companies like Google publish detailed error handling guidelines for their gRPC APIs.

Follow-up questions:

  • How do you handle partial failures in a batch RPC where some items succeed and others fail?
  • What is your strategy for error code standardization across a microservice organization?
  • How do you implement automatic retry with backoff for retriable gRPC errors?

6. How do you secure gRPC services in production?

What the interviewer is really asking: Do you understand TLS configuration, mutual authentication, token-based auth, and the operational challenges of certificate management in a gRPC microservice architecture?

Answer framework:

Security for gRPC operates at three layers: transport security (encryption), authentication (identity verification), and authorization (permission enforcement).

Transport security: gRPC strongly encourages TLS for all production traffic. Unlike REST where TLS termination at a reverse proxy is common, gRPC services often require end-to-end encryption (especially in zero-trust architectures). Configure TLS with modern cipher suites, TLS 1.3 preferably. For internal services, use mutual TLS (mTLS) where both client and server present certificates, establishing bidirectional identity verification.

Certificate management at scale: manually managing certificates for hundreds of services is infeasible. Use automated certificate infrastructure: SPIFFE/SPIRE for workload identity (each service gets a short-lived X.509 SVID automatically), service mesh certificate rotation (Istio/Linkerd automate mTLS between all services), or internal CAs with automatic renewal (Vault PKI, cert-manager in Kubernetes). Certificates should be short-lived (hours to days, not years) and automatically rotated.

Authentication mechanisms: for service-to-service, mTLS provides identity inherently (the client certificate identifies the calling service). For user-to-service, pass JWT or OAuth2 tokens in gRPC metadata (equivalent to HTTP headers). Implement call credentials that attach tokens to every outgoing call. Use per-RPC credentials so tokens can be refreshed without reconnecting.

Authorization: after authenticating the caller, determine if they are allowed to call this specific method with these specific parameters. Implement authorization as a gRPC interceptor (middleware) that checks policies before the request reaches the handler. Policies can be: ACL-based (service A can call methods X and Y on service B), RBAC-based (users with role admin can call management methods), or ABAC-based (authorization depends on request attributes). Use OPA (Open Policy Agent) for complex policy evaluation.

Discuss the channel credentials vs call credentials distinction in gRPC. Channel credentials (TLS) secure the connection. Call credentials (tokens) authenticate individual RPCs. They compose: a channel can have TLS plus per-call JWT tokens.

For defense in depth: even with mTLS, validate inputs rigorously. Implement rate limiting per caller identity. Use network policies (Kubernetes NetworkPolicy) to restrict which services can communicate. Log all authentication failures for security monitoring.

Follow-up questions:

  • How do you handle certificate rotation without service downtime?
  • What is your strategy for securing gRPC services that are exposed to the public internet?
  • How do you audit and log authorization decisions for compliance?

7. How do you implement observability (tracing, metrics, logging) for gRPC services?

What the interviewer is really asking: Can you instrument gRPC services comprehensively, propagate context across service boundaries, and build operational dashboards that enable rapid incident response?

Answer framework:

Observability for gRPC requires instrumentation at three levels: metrics (quantitative measurements), tracing (request flow across services), and logging (detailed event records).

Metrics: gRPC interceptors (middleware) emit metrics for every RPC. Essential metrics per service/method: request rate (RPCs per second), error rate (by status code), latency distribution (p50, p95, p99), and in-flight RPCs. For streaming: messages sent/received per stream, stream duration, and stream error rate. Use OpenTelemetry or Prometheus client libraries. The grpc-ecosystem provides pre-built Prometheus interceptors. Build dashboards showing RED metrics (Rate, Errors, Duration) per service and per method.

Distributed tracing: propagate trace context (trace ID, span ID) via gRPC metadata headers. OpenTelemetry defines the standard propagation format (traceparent header). Each service creates a span for each incoming RPC and propagates the trace context to outgoing RPCs. This builds a complete trace tree showing how a request flows across services, where time is spent, and where errors occur. Implement as a gRPC interceptor that automatically creates spans for all RPCs without modifying business logic.

Logging: structured logging with correlation. Include the trace ID in every log entry so logs can be correlated with traces. Log at appropriate levels: DEBUG for request/response bodies (in non-production), INFO for significant business events, WARN for recoverable errors, ERROR for failures requiring attention. Use gRPC interceptors to log request metadata, status codes, and latency for every RPC without cluttering business logic code.

Context propagation patterns: gRPC metadata serves as the carrier for cross-cutting concerns. Beyond trace context, propagate: request ID (for correlation across all telemetry), deadline/timeout (remaining time budget), caller identity (for authorization and attribution), and feature flags (for request-scoped experiments).

For production operations, implement: service dependency graphs (derived from tracing data showing which services call which), error budget tracking (SLO compliance per service/method), latency regression detection (alert when p99 increases by 20% compared to previous week), and capacity planning metrics (RPC rate trends, resource utilization per RPC). For a deeper understanding of monitoring in distributed systems, see our system design interview guide.

Discuss gRPC-specific observability challenges: long-lived streaming connections make request-level metrics less meaningful (what is the latency of a stream that lasts hours?), HTTP/2 multiplexing means connection-level metrics do not reflect individual RPC performance, and binary protobuf payloads require schema-aware logging (you cannot just log the raw bytes).

Follow-up questions:

  • How do you trace a request that involves both synchronous gRPC calls and asynchronous message queue processing?
  • What is your strategy for sampling in high-throughput gRPC services?
  • How do you implement SLO-based alerting for gRPC services?

8. How do you handle gRPC service versioning and breaking changes?

What the interviewer is really asking: Do you understand the difference between wire-compatible and semantically breaking changes, and can you manage API evolution across a large organization with multiple teams and deployment schedules?

Answer framework:

gRPC versioning operates at two levels: the wire format (protobuf binary compatibility) and the service contract (semantic API behavior).

For wire-compatible evolution (non-breaking changes): adding new fields to messages, adding new methods to a service, adding new enum values, and deprecating (but not removing) fields/methods. Old clients ignore new fields, new clients handle missing fields gracefully via defaults. This is the preferred evolution path and should cover 80% of API changes.

For breaking changes (wire-incompatible or semantically breaking): changing field types, removing fields, renaming services, changing method signatures, or altering the semantic behavior of an existing method in ways clients depend on. These require explicit versioning.

Versioning strategies. Package versioning: use protobuf package names like company.service.v1, company.service.v2. Each major version is a separate service definition, potentially served on the same server. Clients explicitly choose their version. This is Google's recommended approach and what they use for public APIs.

Method versioning: add new method versions (GetUserV2) to the same service. Simpler but leads to API pollution over time. Avoid this for significant changes.

Header-based versioning: use gRPC metadata to indicate the API version. The server dispatches to appropriate handlers. Flexible but loses the type safety benefits of protobuf.

For managing transitions: implement the old and new versions simultaneously on the same server. Old clients continue calling v1 while new clients migrate to v2. Set a sunset date for v1. Monitor traffic to v1 and reach out to teams that have not migrated. Use server reflection to advertise supported versions.

For schema governance: maintain a centralized protobuf repository (buf.build, a monorepo proto directory). Require PR reviews for proto changes. Run buf breaking in CI to detect unintentional breaking changes. Generate and publish client libraries for all supported languages automatically. At organizations like Google, protobuf schema changes go through rigorous review processes.

Discuss the relationship between service versioning and deployment: with Kubernetes, you can run v1 and v2 as separate deployments with traffic routing. With gRPC's ability to serve multiple service versions on one port, a single deployment can handle both.

Follow-up questions:

  • How do you handle a breaking change that must be deployed across 50 services simultaneously?
  • What is your strategy for deprecation communication and enforcement?
  • How do you test backward compatibility in CI?

9. How do you implement deadlines, timeouts, and cancellation propagation in gRPC?

What the interviewer is really asking: Do you understand gRPC's deadline propagation model, why it matters for preventing cascade failures, and how to configure timeouts correctly in a deep service call chain?

Answer framework:

gRPC's deadline mechanism is one of its most important features for building reliable distributed systems. A deadline is an absolute timestamp (not a relative duration) that propagates through the entire call chain. When client A calls service B with a 5-second deadline, and service B calls service C, service C inherits the remaining deadline (say, 4.8 seconds after B spent 200ms processing). If the deadline expires at any point in the chain, all pending work is cancelled.

Why deadlines matter: without deadlines, a slow downstream service causes all upstream services to accumulate blocked threads/goroutines waiting for responses. This cascades into resource exhaustion. With deadlines, the entire request fails fast when it cannot complete within its time budget, freeing resources immediately.

Deadline vs timeout: a timeout is a relative duration (5 seconds from now). gRPC converts timeouts to absolute deadlines internally (current time plus timeout). Absolute deadlines are essential for propagation because each hop subtracts its processing time automatically.

Configuring deadlines in practice: set deadlines at the edge (the first service that receives the user request). Internal services should inherit and respect incoming deadlines rather than imposing their own. If a service must call multiple downstream services, check the remaining deadline before each call and skip non-essential calls if insufficient time remains.

Cancellation propagation: when a deadline expires or a client cancels a call, gRPC sends an RST_STREAM frame on the HTTP/2 stream. The server's context is cancelled, and well-written server code checks context cancellation to abort expensive work immediately (database queries, computation). In Go, this maps to context.Context cancellation. In Java, it is the CancellationListener.

Common mistakes: (1) Not setting any deadline, causing requests to hang indefinitely when a downstream service is unreachable. (2) Setting identical timeouts at each layer (5s timeout at gateway, 5s at service B, 5s at service C) which means the chain can take 15 seconds despite the user's intent of 5 seconds. Always inherit the incoming deadline. (3) Not checking for context cancellation in long-running server operations, wasting resources on work whose result will be discarded. (4) Setting deadlines too aggressively, causing healthy requests to fail during normal latency variations.

For production configuration: set deadlines based on p99 latency plus a buffer. Monitor deadline exceeded rates per method. A spike in DEADLINE_EXCEEDED indicates either degraded performance or misconfigured deadlines. Use adaptive timeouts that adjust based on observed latency percentiles.

Follow-up questions:

  • How do you handle a situation where the remaining deadline is too short for a required downstream call?
  • What is the interaction between deadlines and retries?
  • How do you propagate deadlines across asynchronous boundaries (message queues)?

10. How would you design a gRPC API for a real-time data streaming platform?

What the interviewer is really asking: Can you apply gRPC's streaming patterns to a concrete problem, handling backpressure, ordering, fan-out, and failure recovery for continuous data flows?

Answer framework:

Consider a platform that streams financial market data (stock prices, trades, order book updates) to thousands of subscribers. This is a classic use case for server-streaming and bidirectional streaming gRPC.

API design: define a service with streaming methods. SubscribeMarketData(SubscriptionRequest) returns stream MarketDataUpdate for simple subscription. For dynamic subscriptions where clients can add/remove symbols without reconnecting, use bidirectional streaming: StreamMarketData(stream SubscriptionCommand) returns stream MarketDataUpdate.

The SubscriptionRequest includes the symbols to subscribe to, the data types (trades, quotes, depth), and optional filters (minimum price change threshold). The MarketDataUpdate is a oneof of trade, quote, and depth update messages. Use oneof rather than a generic wrapper to maintain type safety.

Backpressure handling: if a slow client cannot consume updates fast enough, the server must not buffer unboundedly. Implement a per-client buffer with configurable size. When the buffer fills: option 1 (lossy) drops oldest messages and sends a gap notification so the client knows it missed data. Option 2 (lossless) pauses sending and relies on HTTP/2 flow control to propagate backpressure to the data source. The choice depends on the use case: market data can tolerate loss (latest price matters more than history), but trade execution streams cannot.

Ordering guarantees: within a single gRPC stream, messages are delivered in order (HTTP/2 guarantees per-stream ordering). For multi-stream scenarios (one stream per symbol), cross-stream ordering requires sequence numbers or timestamps that clients use to reconstruct total order.

Fan-out architecture: the data source (exchange feed handler) publishes updates to an internal pub/sub system. Each gRPC server instance subscribes to relevant topics and forwards updates to connected clients. This decouples the feed handler from client connections. Use Kafka or NATS for the internal pub/sub layer.

Failure recovery: when a client disconnects and reconnects, it should be able to resume from where it left off. Include sequence numbers in every message. On reconnection, the client sends the last received sequence number, and the server replays from that point (if data is still in buffer) or sends a snapshot followed by incremental updates.

Discuss connection management: implement keepalive pings to detect dead connections quickly (especially important for mobile clients traversing NAT). Set appropriate keepalive parameters: time (how often to ping), timeout (how long to wait for response), and permit_without_calls (send pings even on idle connections to prevent NAT timeout). For broader streaming architecture patterns, explore our how gRPC works guide.

Follow-up questions:

  • How do you handle a client that subscribes to 10,000 symbols and receives updates at a combined rate of 100,000 messages per second?
  • What is your strategy for load balancing when streaming connections are long-lived?
  • How do you test a streaming system for correctness under message loss and reordering?

11. How do you implement gRPC interceptors (middleware), and what patterns do you use them for?

What the interviewer is really asking: Do you understand the interceptor chain pattern in gRPC, how it differs from HTTP middleware, and can you design reusable cross-cutting concerns without polluting business logic?

Answer framework:

gRPC interceptors are the equivalent of HTTP middleware: they wrap RPC handlers and can inspect/modify requests, responses, and metadata. Unlike HTTP middleware, gRPC interceptors are typed for unary and streaming separately because their interfaces differ.

In Go (grpc-go), unary interceptors have the signature: func(ctx, req, info, handler) (resp, error). The interceptor receives the context, request, method info, and the next handler in the chain. It can modify the context, inspect the request, call the handler, and inspect/modify the response. Streaming interceptors wrap the stream object itself.

In Java (grpc-java), interceptors implement ServerInterceptor with an interceptCall method that returns a ServerCall.Listener, enabling interception of both the call and individual messages.

Common interceptor patterns:

  1. Authentication: extract credentials from metadata, validate (verify JWT signature, check token expiry), attach the authenticated identity to the context, reject unauthenticated calls with UNAUTHENTICATED status.

  2. Logging: log the method name, caller identity, request size, response size, status code, and latency for every RPC. Use structured logging with trace ID from context.

  3. Metrics: record request count, latency histogram, and error rate per method. Emit to Prometheus/OpenTelemetry.

  4. Tracing: extract incoming trace context from metadata, create a server span, inject trace context into outgoing calls. This integrates with OpenTelemetry automatically.

  5. Recovery (panic handler): catch panics in handlers, log the stack trace, return INTERNAL status instead of crashing the process.

  6. Validation: validate incoming protobuf messages against constraints (buf validate, protoc-gen-validate). Reject invalid requests with INVALID_ARGUMENT before they reach business logic.

  7. Rate limiting: count RPCs per caller identity per time window. Return RESOURCE_EXHAUSTED when limits are exceeded.

  8. Timeout enforcement: if no deadline is set by the client, apply a default server-side deadline to prevent unbounded requests.

Interceptor ordering matters: authentication should run before authorization, which should run before validation, which should run before business logic. Logging and metrics should wrap the entire chain to capture all outcomes.

Discuss the chain execution model: interceptors form a chain where each can short-circuit (return an error without calling the next handler) or proceed (call the next handler and optionally modify its result). This is identical to middleware in concept but implemented differently per language.

Follow-up questions:

  • How do you test interceptors in isolation?
  • What is the performance impact of a long interceptor chain?
  • How do you handle interceptors that need to modify streaming messages?

12. When would you choose gRPC over REST or GraphQL, and when would you not?

What the interviewer is really asking: Can you make architectural decisions about API paradigms based on concrete trade-offs rather than hype, and do you understand the contexts where each paradigm excels?

Answer framework:

Choose gRPC when: (1) Performance is critical. gRPC's binary serialization and HTTP/2 multiplexing deliver 5-10x throughput over JSON/REST with 2-5x latency reduction. For service-to-service communication handling thousands of RPCs per second, this matters. (2) Strong contracts are essential. Protobuf schemas provide compile-time type safety, auto-generated clients, and clear documentation. In a microservice architecture with 50+ services, this prevents integration bugs. (3) Streaming is required. gRPC's native streaming support is far superior to REST workarounds (WebSocket, SSE, long-polling). (4) Polyglot environments. gRPC's code generation supports 10+ languages from a single proto definition, ensuring consistent behavior across services written in different languages.

Choose REST when: (1) Public-facing APIs where developer experience matters. REST's ubiquity means every developer knows how to call it. Tools like curl, Postman, and browser dev tools work natively. gRPC requires specialized tooling. (2) Simple CRUD applications without performance constraints. REST's simplicity reduces overhead. (3) Browser clients without a proxy. While grpc-web exists, it requires a proxy layer and does not support all features. REST works directly in browsers.

Choose GraphQL when: (1) Diverse client needs. Mobile, web, and third-party clients need different subsets of the same data. GraphQL's field selection eliminates over-fetching. See our REST vs GraphQL comparison. (2) Rapid frontend iteration. Frontend teams can modify queries without backend changes. (3) API aggregation. GraphQL naturally composes data from multiple backend services into a single response.

Do NOT choose gRPC when: (1) You need browser-native support without additional infrastructure. (2) Human readability of API calls matters (debugging, logging). (3) Your organization lacks protobuf expertise and tooling investment. (4) You are building a public API where adoption friction matters.

The hybrid pattern is common in production: gRPC for internal service-to-service communication, GraphQL or REST at the API gateway for client-facing APIs. The gateway translates between GraphQL/REST and internal gRPC calls. See our REST vs gRPC and GraphQL vs gRPC comparisons for detailed technical trade-off analysis.

Follow-up questions:

  • How do you expose a gRPC service to web clients that need browser support?
  • What is your strategy for an organization migrating from REST to gRPC incrementally?
  • In what scenarios would you use both gRPC and GraphQL in the same system?

13. How do you implement retry logic and resilience patterns for gRPC clients?

What the interviewer is really asking: Do you understand the nuances of safe retries in distributed systems, hedged requests, circuit breaking, and how these patterns prevent cascading failures versus amplifying them?

Answer framework:

Retries in distributed systems are a double-edged sword: they improve individual request success rates but can amplify failures (retry storms). gRPC provides built-in retry support and also allows application-level implementation.

gRPC's built-in retry policy (service config): configured per method with maxAttempts, initialBackoff, maxBackoff, backoffMultiplier, and retryableStatusCodes. The client automatically retries failed RPCs matching the configured status codes with exponential backoff. This is transparent to application code. Critical constraint: retries are only safe for idempotent operations. A non-idempotent mutation retried after a network timeout might execute twice.

Hedged requests: send the same request to multiple backends simultaneously and use the first response. This reduces tail latency (p99) dramatically because a single slow backend does not affect the client. gRPC supports hedged requests in its service config. Configure hedgingDelay: wait this long before sending a backup request. This avoids doubling load for fast responses while protecting against slow ones. Only use for idempotent read operations.

Retry budgets: limit the proportion of traffic that consists of retries. If more than 10% of outgoing RPCs are retries, stop retrying and let requests fail. This prevents retry storms where a degraded service receives 3x normal traffic from retries, making the degradation worse. Implement as a token bucket shared across all RPC calls to a given service.

Circuit breaking: if a downstream service has a high error rate, stop calling it entirely for a cooldown period. States: CLOSED (normal, requests flow), OPEN (all requests fail immediately without calling the service), HALF-OPEN (allow a few probe requests to check recovery). Configure thresholds: open the circuit after N failures in M seconds. This protects the client from wasting resources on a known-bad service and gives the downstream service breathing room to recover.

Timeout-retry interaction: when retrying with deadlines, ensure the total time across all attempts fits within the original deadline. If the deadline is 5 seconds and the first attempt takes 3 seconds before failing, the retry only has 2 seconds. gRPC's built-in retry mechanism handles this automatically by checking the remaining deadline before each attempt.

Client-side load shedding: when the client is overwhelmed with outgoing requests, proactively drop lower-priority requests rather than queueing them indefinitely. Prioritize requests based on business criticality.

For production configuration: start conservative (2-3 max retries, 100ms initial backoff, only retry UNAVAILABLE). Monitor retry rates and adjust. Alert when retry rates exceed thresholds. For patterns used at scale at companies like Google, see our learning paths.

Follow-up questions:

  • How do you implement retries for streaming RPCs?
  • What is the interaction between client retries and server-side rate limiting?
  • How do you distinguish between transient failures (worth retrying) and permanent failures (not worth retrying)?

14. How do you design a gRPC service for a multi-region deployment?

What the interviewer is really asking: Can you handle the complexity of global service deployment including regional routing, data residency, failover, and the fundamental tension between consistency and latency across geographic distances?

Answer framework:

Multi-region gRPC deployments face three core challenges: routing clients to the nearest region for low latency, handling failover when a region degrades, and maintaining data consistency across regions.

For routing: use DNS-based geographic routing (GeoDNS or cloud provider solutions like AWS Route53 latency-based routing) to direct clients to the nearest region. For gRPC specifically, configure client-side load balancing with region-aware priorities: prefer local region, failover to adjacent regions. The xDS protocol supports priority-based locality routing natively.

For failover: implement health checking at the regional level. If a region becomes unhealthy (elevated error rates, high latency), redirect traffic to other regions. With DNS-based routing, lower TTLs enable faster failover but increase DNS query load. With client-side balancing, health signals propagate faster. Implement graceful degradation: a region might be partially healthy, serving read traffic but not writes.

For data consistency: this is the fundamental challenge. Options span the consistency spectrum. Strong consistency (all regions see the same data simultaneously): use a globally consistent database like Spanner. Writes go to a leader region and replicate synchronously. High write latency from non-leader regions (cross-region round trip). Eventual consistency: each region has a local database replica. Writes are processed locally and replicated asynchronously. Low latency but clients in different regions see different data temporarily. Conflict resolution (last-write-wins, application-specific merge) is required.

For gRPC service design: separate read and write paths. Reads can serve from local replicas (eventual consistency acceptable for most read operations). Writes route to the leader region or use conflict-free data structures (CRDTs) for multi-leader writes. Implement the service to accept a consistency_level parameter: clients that need strong consistency explicitly request it and accept the latency cost.

Discuss session affinity: once a client connects to a region, keep it there for consistency within a session. Use gRPC metadata to carry session identifiers that the load balancer uses for affinity routing.

Address deployment strategy: deploy new versions region by region (rolling deployment across regions). Start with a canary region, monitor, then expand. Implement feature flags that can disable new functionality per region. Ensure protobuf schema compatibility between regions running different versions during the rollout window.

For the URL shortener system design, multi-region deployment is a key consideration for global services. Understanding these patterns is essential for staff-level architecture discussions.

Follow-up questions:

  • How do you handle a situation where one region has stale data and a client observes inconsistency?
  • What is your strategy for testing failover scenarios?
  • How do you handle data residency requirements where certain data must stay in specific regions?

15. How do you performance test and optimize a gRPC service?

What the interviewer is really asking: Do you have a systematic approach to identifying and resolving performance bottlenecks in gRPC services, including serialization cost, connection management, and server resource utilization?

Answer framework:

Performance testing for gRPC requires specialized tooling because standard HTTP load testing tools (wrk, ab) do not speak gRPC's binary protocol. Use purpose-built tools: ghz (gRPC benchmarking tool), locust with gRPC support, or custom clients using the gRPC library's built-in benchmarking utilities.

Load testing methodology: start with single-client baseline to establish per-request latency without contention. Then increase concurrency gradually, measuring throughput (RPCs/second), latency percentiles (p50, p95, p99), error rate, and resource utilization (CPU, memory, goroutines/threads, file descriptors). Identify the saturation point where latency degrades non-linearly.

Common bottlenecks and optimizations:

  1. Serialization cost: protobuf serialization is fast but not free. For very hot paths, pre-serialize responses that do not change per-request. Use proto.Marshal once and cache the bytes. For messages with large repeated fields, consider streaming instead of single large messages.

  2. Memory allocation: protobuf message construction allocates memory. Use sync.Pool (Go) or object pools (Java) for frequently allocated message types. Profile allocation rates and reduce GC pressure.

  3. Connection management: ensure clients reuse connections (gRPC does this by default). Monitor the number of active connections. With many clients connecting to one server, each connection consumes kernel buffers. Tune kernel parameters (net.core.somaxconn, net.ipv4.tcp_max_syn_backlog) for high connection counts.

  4. Concurrency configuration: gRPC servers use thread/goroutine pools to handle requests. Configure the pool size based on workload: CPU-bound work needs cores count threads, IO-bound work can use more. In Go, the runtime handles this with goroutines. In Java, configure the server executor thread pool.

  5. Flow control tuning: adjust HTTP/2 flow control window sizes for streaming workloads. The default window (64KB) is conservative. For high-throughput streams over high-latency connections, increase the initial window size (BDP, bandwidth-delay product) to avoid underutilizing the connection.

  6. Keepalive configuration: tune keepalive parameters to balance connection freshness (detect dead connections quickly) with overhead (pings consume bandwidth). Set server-side enforcement parameters to prevent clients from pinging too aggressively.

  7. Payload optimization: minimize message sizes. Use field masks to return only requested fields. Compress large payloads (gRPC supports gzip compression per-call via metadata). However, compression trades CPU for bandwidth so profile whether it helps your specific workload.

For production optimization: implement continuous performance testing in CI. Track per-commit latency regressions. Use profiling (pprof in Go, async-profiler in Java) to identify CPU hotspots. Monitor GC pause times. For comprehensive preparation on system performance topics, explore our learning paths and pricing plans for guided interview prep.

Follow-up questions:

  • How do you identify whether a latency issue is in serialization, network, or handler logic?
  • What is the impact of TLS on gRPC performance, and how do you minimize it?
  • How do you load test bidirectional streaming RPCs?

Common Mistakes in gRPC Interviews

  1. Treating gRPC as just faster REST. gRPC's value extends beyond performance. The type-safe contracts, streaming capabilities, deadline propagation, and interceptor patterns represent a fundamentally different programming model. Reducing gRPC to "binary REST" reveals shallow understanding.

  2. Ignoring the operational complexity of HTTP/2. HTTP/2 multiplexing changes how load balancing, connection management, and debugging work. If you describe gRPC infrastructure without addressing L7 load balancing requirements, it signals a gap in production experience.

  3. Not understanding protobuf compatibility rules. Saying you would "just add a field" without discussing field numbering, wire types, and forward/backward compatibility rules indicates you have not managed proto schemas in a multi-team environment.

  4. Forgetting about deadline propagation. gRPC's deadline mechanism is one of its strongest features for preventing cascade failures. If you discuss gRPC reliability without mentioning deadlines, you are missing a critical concept.

  5. Recommending gRPC for every use case. Senior engineers understand trade-offs. gRPC is inappropriate for browser-native APIs, public APIs where developer adoption matters, and simple applications where REST's ubiquity outweighs gRPC's performance advantages. Demonstrating when NOT to use gRPC shows maturity.

How to Prepare for gRPC Interviews

Build a real multi-service system using gRPC. Implement at least two services that communicate, with proper interceptors, streaming, error handling, and deadline propagation. Deploy with a service mesh (Istio or Linkerd) to understand production operational patterns. Experience the pain points firsthand: proto schema evolution, load balancing gotchas, debugging binary protocols.

Study Protocol Buffers deeply. Read the encoding specification to understand how varints, length-delimited fields, and wire types work. This knowledge is essential for understanding compatibility rules and performance characteristics.

Explore the gRPC ecosystem: grpc-gateway (REST transcoding), buf (proto tooling), Connect (modern gRPC alternative), and the xDS protocol (dynamic configuration). Understanding the ecosystem shows breadth.

Review how major companies use gRPC: Google's internal Stubby system that gRPC is based on, Netflix's gRPC adoption for inter-service communication, Uber's use of gRPC for their massive microservice architecture, and Square's use for mobile-to-backend communication. Reading engineering blogs from companies like Google and Meta provides production context that interview answers benefit from.

Practice explaining HTTP/2 mechanics, protobuf encoding, and streaming patterns on a whiteboard. Interviewers value candidates who can explain protocol-level details clearly. For a structured preparation plan, explore our learning paths and system design interview guide. Our pricing page has details on guided preparation programs.

Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.