INTERVIEW_QUESTIONS

GraphQL Interview Questions for Senior Engineers (2026)

Top GraphQL interview questions with detailed answer frameworks covering schema design, resolvers, performance optimization, federation, and production patterns used at Meta, Shopify, and GitHub.

20 min readUpdated Apr 20, 2026
interview-questionsgraphqlsenior-engineerapi-design

Why GraphQL Matters in Senior Engineering Interviews

GraphQL has moved well beyond its origin as a Facebook internal tool to become the dominant API paradigm for complex client-server interactions at companies like Meta, Shopify, GitHub, Airbnb, and Twitter. Senior engineering interviews increasingly test your ability to design, optimize, and operate GraphQL APIs at production scale, not merely write queries against an existing schema.

Unlike REST, GraphQL gives clients the power to request exactly the data they need in a single round trip. This flexibility introduces a new class of engineering challenges: query complexity analysis, resolver optimization, authorization at the field level, caching strategies that account for dynamic query shapes, and federation patterns that compose multiple services into a unified graph. Interviewers expect senior candidates to navigate these trade-offs with nuance, demonstrating both theoretical understanding and battle-tested production experience.

At the senior level, you are not just implementing a GraphQL server. You are making architectural decisions about schema governance, defining performance budgets, choosing between schema-first and code-first approaches, designing federation boundaries, and building observability into every layer. For a broader understanding of how GraphQL compares to alternative API paradigms, see our REST vs GraphQL comparison and GraphQL vs gRPC analysis. To understand how GraphQL fits into broader system architecture, explore our API gateway concepts and system design interview guide.

1. How does GraphQL resolve queries, and what are the performance implications of naive resolver implementations?

What the interviewer is really asking: Do you understand the resolver execution model deeply enough to identify and fix the N+1 problem, explain execution order, and reason about query planning?

Answer framework:

GraphQL executes queries by traversing the query tree top-down, invoking a resolver function for each field. The root query type has resolvers that fetch top-level resources, and each returned object type has resolvers for its fields. The execution engine calls resolvers lazily as the query tree is traversed, meaning a deeply nested query can trigger hundreds of resolver invocations.

The most critical performance implication is the N+1 problem. Consider a query that fetches a list of 50 posts and for each post resolves the author field. A naive implementation makes 1 query to fetch 50 posts plus 50 individual queries to fetch each author. This is catastrophic at scale. The standard solution is the DataLoader pattern, originally developed at Facebook. DataLoader batches individual load calls within a single execution tick into one bulk query. Instead of 50 author queries, DataLoader collects all 50 author IDs and makes a single SELECT WHERE id IN (...) query.

Beyond DataLoader, discuss query planning. Some GraphQL servers implement look-ahead optimization where resolvers can inspect the full query AST to determine which fields will be needed downstream. This allows a resolver to eagerly join related data in a single database query rather than relying on lazy resolution. Tools like Join Monster and Hasura compile GraphQL queries directly to optimized SQL with appropriate JOINs.

Explain resolver execution order: fields at the same level execute in parallel by default in most implementations (graphql-js uses Promise.all for sibling fields). However, mutations execute serially to preserve operation ordering. Understanding this is crucial for reasoning about latency. The total query latency is determined by the longest sequential chain of resolver calls (the critical path), not the total number of resolvers.

Discuss the implications for monitoring: you need per-resolver tracing (like Apollo Studio provides) to identify slow resolvers. A single slow resolver on a frequently requested field can degrade performance for every query that touches it. Instrument resolver execution time, batch efficiency (DataLoader hit rates), and overall query execution time.

Follow-up questions:

  • How would you implement DataLoader in a serverless environment where there is no persistent process?
  • What happens when a DataLoader batch spans multiple database shards?
  • How do you handle the N+1 problem for computed fields that cannot be batched?

2. How would you design a GraphQL schema for a complex domain like an e-commerce platform?

What the interviewer is really asking: Can you make thoughtful schema design decisions that balance client ergonomics, backend feasibility, evolvability, and performance?

Answer framework:

Schema design is the most consequential decision in a GraphQL API because schemas are contracts. Once clients depend on a field, removing it requires a deprecation cycle. Start with domain modeling: identify the core entities (Product, Order, User, Cart, Review), their relationships, and the primary use cases (product browsing, checkout flow, order history).

Apply the schema-first principle: design the schema from the client's perspective before thinking about backend implementation. This avoids the trap of mirroring database tables in your schema. For example, a Product type should expose computedFields like formattedPrice and availabilityStatus rather than raw database columns like price_cents and inventory_count.

Discuss naming conventions and consistency: use camelCase for fields, PascalCase for types, and establish patterns that scale. For collections, always use connection-based pagination (Relay spec) with edges, nodes, and pageInfo rather than simple lists. This allows you to add edge metadata later and provides a consistent pagination interface.

For mutations, apply the input object pattern: every mutation takes a single input argument of a dedicated input type and returns a dedicated payload type with the modified object plus any user-facing errors. This pattern (established by GitHub's API) makes mutations evolvable since you can add optional fields to the input without breaking changes.

Discuss union types and interfaces for polymorphism. A search result might return Products, Categories, or Articles. Model this as a SearchResult union. Use interfaces like Node (with a global ID) for consistent refetching patterns.

Address authorization in schema design: should unauthorized fields return null, throw errors, or be hidden entirely? The most maintainable approach is returning null with an error in the errors array, annotated with the field path. This keeps the schema stable across permission levels.

Discuss schema evolution: use the @deprecated directive liberally, add new fields freely (additive changes are non-breaking), never remove fields without a deprecation period, and use schema change tooling like GraphQL Inspector to detect breaking changes in CI. For more on API design patterns and their trade-offs, see our how GraphQL works guide.

Follow-up questions:

  • How would you handle a field that is expensive to compute and not always needed?
  • What is your strategy for versioning a GraphQL schema?
  • How do you prevent schema bloat as the product grows?

3. Explain GraphQL federation and when you would choose it over schema stitching.

What the interviewer is really asking: Do you understand the architectural patterns for composing multiple GraphQL services, the trade-offs between approaches, and when monolithic schemas are actually preferable?

Answer framework:

GraphQL federation (introduced by Apollo) allows multiple teams to own independent subgraphs that compose into a single unified supergraph. Each subgraph defines the types and fields it owns, and a gateway (router) handles query planning across subgraphs at runtime. The key primitives are: @key (defines the primary key for entity resolution across subgraphs), @external (references a field defined in another subgraph), @requires (declares field dependencies from other subgraphs), and @provides (indicates a subgraph can resolve fields typically owned by another).

Schema stitching, the predecessor approach, merges schemas at the gateway level by defining explicit relationships between types from different services. The gateway must know how to delegate and transform between schemas. The fundamental problem with stitching is that it centralizes composition logic in the gateway, creating a coordination bottleneck as the number of services grows.

Federation solves this by distributing composition knowledge: each subgraph declares how its types relate to entities in other subgraphs. The composition happens at build time (composition validation) and at runtime (query planning). This means teams can deploy their subgraphs independently as long as composition checks pass.

Discuss when to choose federation: when you have 3+ teams that need independent deployment cycles, when domain boundaries are clear (users service, products service, orders service), and when you need organizational scalability. Discuss when NOT to use federation: when you have fewer than 3 services (the overhead is not worth it), when performance requirements are extreme (cross-subgraph joins add latency), or when domain boundaries are fuzzy (frequent cross-service field dependencies indicate poor service boundaries).

Address federation performance concerns: each cross-subgraph reference requires an entity resolution call. If a query touches 4 subgraphs, the gateway makes at least 4 network calls. Query planning optimization (parallelizing independent subgraph calls, batching entity resolutions) is critical. Apollo Router implements a Rust-based query planner that optimizes this. Discuss how this compares to the unified API approach detailed in our API gateway concepts.

Follow-up questions:

  • How do you handle shared types that multiple subgraphs need to extend?
  • What is your strategy for testing composed schemas before deployment?
  • How do you debug a slow query that spans four subgraphs?

4. How do you implement authentication and authorization in a GraphQL API?

What the interviewer is really asking: Can you design a security model that works with GraphQL's flexible query structure, handles field-level permissions, and does not degrade performance?

Answer framework:

Authentication and authorization in GraphQL are fundamentally different from REST because clients can request arbitrary combinations of fields in a single query. You cannot simply protect endpoints; you must protect fields.

For authentication, handle it at the transport layer before GraphQL execution begins. Extract the JWT or session token from the HTTP headers, validate it, and attach the authenticated user to the GraphQL context object. Every resolver receives this context and can access the current user. This is no different from REST.

Authorization is where GraphQL gets complex. There are three main approaches. First, resolver-level authorization: each resolver checks permissions before returning data. This is the most flexible but leads to scattered auth logic. Second, directive-based authorization: use custom schema directives like @auth(requires: ADMIN) or @hasRole(role: "editor") that wrap resolvers with permission checks. This co-locates auth rules with the schema definition. Third, middleware-layer authorization: implement a validation layer that analyzes the query before execution and rejects unauthorized field selections entirely.

The directive approach is the most maintainable for large schemas. Implement it as a schema transform that wraps field resolvers with permission checks. The directive can support role-based access (ADMIN, EDITOR, VIEWER), attribute-based access (user owns the resource), and relationship-based access (user is a member of the organization).

Discuss field-level visibility vs field-level nullability. Some systems hide unauthorized fields from introspection entirely (schema masking). Others allow introspection but return null with an error for unauthorized access. Schema masking is more secure but complicates client development. The common production approach is to return null with a clear authorization error.

Address the performance concern: if every field resolver checks permissions, you might make redundant auth checks. Use memoization in the context to cache authorization decisions within a request. If a user is authorized to view a Post, they are likely authorized to view its title and body fields since those inherit the parent authorization.

Discuss authorization for mutations specifically: validate permissions at the mutation resolver level, not at the input validation level. A user might be authorized to update some fields of a resource but not others. Model this with fine-grained permission checks on the input fields.

Follow-up questions:

  • How do you handle authorization for nested fields that depend on the parent's data?
  • How would you implement row-level security in a GraphQL API?
  • What is your strategy for testing authorization rules comprehensively?

5. How do you prevent malicious or expensive queries from overwhelming a GraphQL server?

What the interviewer is really asking: Do you understand query complexity analysis, depth limiting, rate limiting strategies specific to GraphQL, and the operational challenges of serving an open query language?

Answer framework:

GraphQL's flexibility is a double-edged sword: the same power that lets clients request exactly what they need also lets them construct queries that are catastrophically expensive. A deeply nested query on a cyclic graph (user -> friends -> friends -> friends...) could generate millions of database queries.

Implement multiple layers of defense. First, static analysis before execution. Query depth limiting: reject queries deeper than N levels (typically 10-15). Query complexity analysis: assign a cost to each field based on its computational expense. A scalar field might cost 1, a list field costs multiplier times child complexity, and a connection field costs first/last argument times child complexity. Sum the total and reject queries above a threshold. Tools like graphql-query-complexity and graphql-validation-complexity implement this.

Second, timeout-based protection during execution. Set a maximum execution time (for example, 10 seconds) and abort queries that exceed it. This catches cases where static analysis underestimates actual cost.

Third, rate limiting. Traditional per-endpoint rate limiting does not work for GraphQL because one endpoint serves all queries. Instead, implement cost-based rate limiting: each query consumes a budget proportional to its computed complexity. A simple query might cost 10 points, a complex one 500 points. Users have a budget that replenishes over time (token bucket algorithm). This is the approach GitHub's GraphQL API uses.

Fourth, persisted queries (also called automatic persisted queries or APQ). Instead of accepting arbitrary query strings, require clients to register queries at build time. The server only executes queries from the allowlist. This eliminates the entire class of malicious query attacks but requires build-time coordination. It works well for first-party clients but not for public APIs.

Fifth, response size limiting: cap the total response size and the number of objects returned. This prevents queries that request millions of items through pagination abuse.

Discuss monitoring: track query complexity distribution, identify queries above the 95th percentile, and proactively reach out to teams whose queries are approaching limits. Build dashboards showing the most expensive queries, sorted by total cost multiplied by frequency.

Follow-up questions:

  • How do you assign accurate complexity costs to fields whose cost depends on runtime data?
  • What is your strategy for communicating rate limit status to clients?
  • How do you handle a legitimate query that exceeds complexity limits?

6. How would you implement real-time features with GraphQL subscriptions at scale?

What the interviewer is really asking: Do you understand the WebSocket-based subscription lifecycle, fan-out patterns, connection management at scale, and the operational challenges of persistent connections?

Answer framework:

GraphQL subscriptions provide real-time data through persistent connections, typically WebSocket. The client sends a subscription operation, the server maintains the connection and pushes data whenever the subscribed event occurs. The GraphQL over WebSocket protocol (graphql-ws) defines the handshake and message format.

For the subscription lifecycle: client connects via WebSocket, sends connection_init with auth credentials, server validates and responds with connection_ack, client sends subscribe with the subscription query, server confirms with next messages containing data as events fire, client can send complete to unsubscribe.

The primary scaling challenge is maintaining millions of persistent WebSocket connections. Each connection consumes server memory (kernel buffers, application state). A single server can typically handle 50,000-100,000 concurrent connections depending on memory. For millions of connections, you need a fleet of WebSocket servers with a pub/sub backbone.

For the architecture: WebSocket gateway servers manage client connections. A pub/sub system (Redis Pub/Sub, Kafka, or NATS) distributes events. When a mutation triggers a subscription event, the mutating service publishes to the pub/sub system. All WebSocket servers subscribed to that topic receive the event and push to relevant clients.

Discuss subscription filtering: not every event is relevant to every subscriber. Implement server-side filtering so the pub/sub system or the WebSocket server only delivers events matching the subscription's filter criteria. For example, a subscription to order updates should only receive events for the authenticated user's orders.

Address connection management: implement heartbeat/keepalive to detect dead connections. Handle reconnection gracefully: when a client reconnects, it should receive any events it missed during disconnection (use event IDs and replay). Implement backpressure: if a client cannot consume events fast enough, buffer up to a limit then drop oldest events.

Discuss alternatives to WebSocket-based subscriptions: Server-Sent Events (SSE) for unidirectional streams (simpler, works through HTTP proxies), and live queries (the server re-executes the query on data change and sends the diff). Live queries are simpler for clients but more expensive for servers.

Follow-up questions:

  • How do you handle subscription authentication and token refresh on long-lived connections?
  • What happens when a WebSocket server crashes with 50,000 active connections?
  • How do you load test a subscription system?

7. Compare schema-first and code-first approaches to building a GraphQL server. When do you choose each?

What the interviewer is really asking: Have you operated GraphQL servers long enough to have opinions about developer experience, type safety, and maintainability trade-offs between these paradigms?

Answer framework:

Schema-first (SDL-first) means writing the GraphQL schema definition language file first, then implementing resolvers that match. Tools: Apollo Server with .graphql files, graphql-tools. Code-first means defining the schema programmatically through code constructs that generate the SDL. Tools: Nexus, TypeGraphQL, Pothos (TypeScript), Strawberry (Python), gqlgen (Go).

Schema-first advantages: the schema serves as a clear contract and documentation artifact. Non-engineers (product managers, mobile developers) can review and discuss schema changes in PRs. Schema changes are explicit and visible in diffs. It enforces a design-first workflow where you think about the API surface before implementation.

Schema-first disadvantages: resolver implementations can drift from the schema (no compile-time guarantee that all fields have resolvers). Input types must be manually kept in sync with TypeScript interfaces. Code generation tools (like graphql-codegen) mitigate this but add build complexity. Complex schemas with many directives and custom scalars become unwieldy in SDL.

Code-first advantages: full type safety from schema definition to resolver implementation. If you rename a type, the compiler catches all references. Resolvers are co-located with type definitions, improving discoverability. Complex patterns like generic pagination or polymorphic types are easier to express programmatically. No separate code generation step.

Code-first disadvantages: the schema is implicit, making it harder to review API surface changes in PRs without generated artifacts. Developers must learn the code-first library's API in addition to GraphQL concepts. Schema output must be committed or generated in CI for clients.

When to choose schema-first: public APIs where the schema is the product, teams where non-engineers participate in API design reviews, and organizations standardized on schema-first tooling. When to choose code-first: TypeScript-heavy backends where type safety is paramount, rapid development teams where schema and implementation evolve together, and complex domains where programmatic schema construction reduces boilerplate.

In practice, many mature organizations use a hybrid: code-first for implementation but generate and commit the SDL for documentation and breaking change detection. For a deeper exploration of GraphQL implementation patterns, see our how GraphQL works guide.

Follow-up questions:

  • How do you enforce schema conventions in a code-first world?
  • What is your strategy for generating client types from the schema?
  • How do you handle schema previews and experimental fields?

8. How do you implement efficient caching for a GraphQL API?

What the interviewer is really asking: Do you understand why HTTP caching does not work naturally with GraphQL and what alternative strategies exist at multiple layers?

Answer framework:

GraphQL's single-endpoint, POST-based query model breaks traditional HTTP caching. Every request goes to the same URL with a different body, so CDNs and browser caches cannot differentiate between queries. This is one of the primary trade-offs versus REST, as discussed in our REST vs GraphQL comparison.

Layer 1: Client-side normalized caching. Apollo Client and urql implement normalized caches that store entities by their unique ID and type. When a query returns a User with id:123, it is stored in the cache under User:123. Subsequent queries that include the same user resolve from cache. This eliminates redundant network requests and provides instant UI updates after mutations. The cache can be configured with field-level cache policies (cache-first, network-only, cache-and-network).

Layer 2: CDN/edge caching with persisted queries. Since persisted queries have deterministic IDs, you can serve them via GET requests with the query ID as a URL parameter. This makes them cacheable by standard HTTP infrastructure. Set Cache-Control headers based on the query's data sensitivity and freshness requirements. Tools like Apollo Router and GraphCDN (now Stellate) implement this.

Layer 3: Server-side response caching. Cache full query responses keyed by the normalized query string plus variables. Apply different TTLs based on query content: queries touching only public, slowly-changing data (product catalog) get long TTLs; queries with user-specific data get short or no TTLs. Implement cache invalidation by tagging cached responses with the entity IDs they contain; when an entity changes, purge all responses containing it.

Layer 4: Resolver-level caching. Cache individual resolver results using Redis or Memcached. Key by type, ID, and field name. This is particularly effective for expensive computed fields (aggregations, recommendations). Use DataLoader's built-in per-request cache to avoid redundant fetches within a single query execution.

Layer 5: Database query caching. For read-heavy workloads, cache database query results. This is not GraphQL-specific but applies broadly.

Discuss cache invalidation strategies: TTL-based (simple, allows staleness), event-driven (publish invalidation events when data changes), and hybrid (short TTL plus event-driven for critical updates). Discuss the cache-control directive proposal for GraphQL that allows schema authors to annotate fields with caching hints.

Follow-up questions:

  • How do you handle caching for authenticated queries where responses differ per user?
  • What is your strategy for cache warming after a deployment?
  • How do you measure cache effectiveness across the GraphQL layer?

9. How do you handle errors in a GraphQL API, and what error handling patterns do you recommend?

What the interviewer is really asking: Do you understand that GraphQL's error model differs fundamentally from REST, and can you design error handling that serves both client developers and operational observability?

Answer framework:

GraphQL returns HTTP 200 for almost every response, including those with errors. The response body contains both data (partial results) and errors (an array of error objects). This means traditional HTTP status code monitoring is insufficient. A 200 response might contain a complete failure.

The GraphQL spec defines errors as objects with message, locations (where in the query the error occurred), path (which field failed), and extensions (arbitrary metadata). The key insight is that GraphQL supports partial responses: if one field in a query fails, other fields can still resolve successfully. This is powerful for resilient UIs but complicates error handling.

Discuss error categorization. Client errors (validation failures, invalid arguments) should be modeled as part of the schema using union return types or error fields in mutation payloads. Server errors (database failures, timeout) should appear in the top-level errors array. This distinction matters because client errors are expected and should have structured types that clients can programmatically handle, while server errors are unexpected and should be generic.

For mutations, the recommended pattern is a result union type: mutation createUser returns CreateUserSuccess | ValidationError | PermissionError. This makes all possible outcomes explicit in the schema and gives clients exhaustive type checking. The alternative (returning errors in the errors array) loses type safety and makes client error handling fragile.

For observability, extend errors with structured metadata in the extensions field: error codes (UNAUTHENTICATED, FORBIDDEN, NOT_FOUND, INTERNAL), correlation IDs for tracing, and timestamps. Never expose internal details (stack traces, database errors) to clients in production. Log full details server-side, keyed by correlation ID.

Discuss error masking: graphql-js and most servers provide a formatError hook that sanitizes errors before sending to clients. Use this to replace unexpected errors with a generic message while preserving the original error in server logs.

Discuss partial error resilience: when a non-critical field fails, should the entire query fail? GraphQL's nullable-by-default convention means field errors bubble up to the nearest nullable parent. Design your schema nullability carefully: make fields non-null only when you are certain they will always resolve successfully.

Follow-up questions:

  • How do you monitor error rates in a system where everything returns HTTP 200?
  • What is your strategy for error retry logic on the client?
  • How do you handle errors in subscriptions?

10. How would you migrate a REST API to GraphQL incrementally?

What the interviewer is really asking: Can you plan a pragmatic migration that delivers value incrementally without requiring a big-bang rewrite or disrupting existing clients?

Answer framework:

The key principle is incremental adoption: GraphQL and REST coexist during migration. Never attempt a big-bang rewrite. The migration typically follows four phases.

Phase 1: GraphQL as a BFF (Backend for Frontend). Deploy a GraphQL server that wraps existing REST endpoints. Resolvers call REST APIs internally. This delivers immediate value to frontend teams (single request, typed responses, flexible queries) while the backend remains unchanged. Use DataLoader to batch multiple REST calls. This phase typically takes 2-4 weeks and proves the value proposition.

Phase 2: Optimize hot paths. Identify the most frequently used queries and bypass the REST layer for those, connecting resolvers directly to databases or internal services. Monitor performance improvements. This is where you start building the system design patterns that will support the final architecture.

Phase 3: Domain-by-domain migration. Migrate one domain at a time (users, products, orders) from REST-backed resolvers to native GraphQL resolvers. Each domain migration is independent and can be rolled back. Maintain the REST API for existing clients during this phase.

Phase 4: REST deprecation. Once all critical paths run natively through GraphQL, deprecate REST endpoints with a sunset timeline. Maintain REST stubs that proxy to GraphQL internally for clients that cannot migrate.

Discuss the technical challenges: mapping REST resources to GraphQL types (REST's flat resources vs GraphQL's nested types), handling REST pagination patterns in GraphQL (offset-based to cursor-based), and translating REST error codes to GraphQL error patterns.

Discuss organizational challenges: training teams on GraphQL, establishing schema governance, choosing tooling, and building buy-in from teams that own REST services. The API gateway pattern can ease the transition by providing a single entry point that routes to either GraphQL or REST based on the request.

Address the question of whether to migrate at all: if your API serves primarily server-to-server communication with stable schemas, REST or gRPC might be more appropriate. GraphQL shines for client-facing APIs with diverse client needs.

Follow-up questions:

  • How do you handle authentication differences between the REST and GraphQL layers during migration?
  • What metrics do you track to measure migration progress and success?
  • How do you maintain backward compatibility for mobile clients that cannot be force-updated?

11. How do you implement pagination in GraphQL, and what are the trade-offs between approaches?

What the interviewer is really asking: Do you understand cursor-based vs offset-based pagination, the Relay connection specification, and the performance implications of each approach?

Answer framework:

GraphQL supports three pagination approaches, each with distinct trade-offs.

Offset-based pagination: pass offset and limit arguments. Simple to implement, maps directly to SQL OFFSET/LIMIT. Problems: poor performance on large offsets (database must scan and discard rows), inconsistent results when data changes between pages (insertions cause items to appear twice or be skipped). Suitable only for small, static datasets or admin interfaces where UX requirements are relaxed.

Cursor-based pagination (Relay connection spec): the standard for production GraphQL APIs. Uses opaque cursors (typically base64-encoded primary keys or timestamps) to mark position. Arguments: first/after (forward pagination) and last/before (backward pagination). Returns edges (with cursor and node) and pageInfo (hasNextPage, hasPreviousPage, startCursor, endCursor).

Cursor-based advantages: consistent results regardless of concurrent insertions/deletions, performant at any depth (database uses indexed WHERE clause rather than OFFSET), and compatible with infinite scroll UIs. The cursor encodes the sort position, so the database can seek directly to the correct row.

Implementation details: for a cursor encoding the createdAt timestamp and id (for tie-breaking), the database query becomes WHERE (created_at, id) < (cursor_timestamp, cursor_id) ORDER BY created_at DESC, id DESC LIMIT first+1. Fetching first+1 rows lets you determine hasNextPage without a separate count query.

The Relay connection spec also defines totalCount (optional, expensive for large datasets), and edge-level metadata (useful for modeling relationship attributes like "added_at" on a friendship edge).

Discuss the trade-off with totalCount: computing exact counts on large tables is expensive (full table scans). Options include cached approximate counts, estimated counts from query planner statistics, or simply not providing totalCount. For most UIs, hasNextPage is sufficient.

Discuss keyset pagination as the underlying database technique: it requires a deterministic sort order with a unique tiebreaker. Composite cursors handle multi-column sorts. For complex sort orders (relevance scoring), consider using search engine cursors rather than database cursors.

For the learning paths that cover data access patterns in depth, pagination is one of the most practically important topics to master.

Follow-up questions:

  • How do you implement bidirectional pagination efficiently?
  • What is your strategy for paginating through filtered results where the filter is expensive?
  • How do you handle cursor invalidation when underlying data is deleted?

12. How do you design a GraphQL API for a mobile-first application with offline support?

What the interviewer is really asking: Do you understand the unique constraints of mobile clients (bandwidth, latency, battery) and how GraphQL features can be leveraged or adapted for offline-capable applications?

Answer framework:

Mobile clients face constraints that demand specific GraphQL patterns: variable network quality, bandwidth costs, battery limitations, and the need for offline functionality.

For bandwidth optimization: persisted queries replace full query strings with short hashes, reducing request payload from kilobytes to bytes. Automatic persisted queries (APQ) do this transparently: the client sends a hash, the server looks up the query. If not found, the client retransmits the full query once. Response compression (gzip/brotli) reduces response size by 60-80%. The @defer and @stream directives allow the server to send critical data first and stream slower fields asynchronously.

For offline support: the normalized client cache (Apollo Client, urql) serves as the offline data store. Queries against cached data resolve instantly without network. Implement optimistic mutations: when the user performs an action offline, immediately update the local cache as if the mutation succeeded, queue the mutation for later execution, and reconcile when connectivity returns.

Design the schema to support offline conflict resolution: include version fields or updatedAt timestamps on mutable types. When syncing queued mutations, detect conflicts (server version differs from the version the mutation was based on) and apply resolution strategies (last-write-wins, manual merge, or server-wins).

For battery optimization: batch network requests rather than making individual calls. Use the @defer directive to avoid blocking render on slow fields. Implement intelligent background sync that respects battery level and network type (defer large syncs until WiFi).

Discuss the query architecture for mobile: design queries around screen-level data needs. Each screen has one or two queries that fetch all required data. This minimizes round trips. Use fragments to share type definitions across screens. Design the schema with mobile list views in mind: provide summary fields (title, thumbnail, preview) separately from detail fields (fullDescription, highResImages).

For sync architecture, consider a delta sync approach: a query like syncChanges(since: timestamp) returns all entities modified since the last sync. This is more efficient than refetching everything. Model the response as a union of created/updated/deleted events.

Follow-up questions:

  • How do you handle mutation ordering when replaying offline mutations?
  • What is your strategy for cache eviction on memory-constrained devices?
  • How do you handle schema changes when mobile clients cannot be force-updated?

13. How do you implement field-level observability and performance monitoring in a GraphQL API?

What the interviewer is really asking: Can you build operational visibility into a system where traditional request-level metrics are insufficient because a single endpoint serves infinitely varied queries?

Answer framework:

GraphQL's single-endpoint nature renders traditional API monitoring (status codes per endpoint, latency per route) nearly useless. A GraphQL-specific observability strategy requires instrumentation at multiple levels.

Operation-level metrics: parse and normalize each incoming query to an operation signature (strip variables, sort fields deterministically). Track latency, error rate, and frequency per operation signature. This is the GraphQL equivalent of per-endpoint metrics in REST. Apollo Studio and Stellate provide this automatically.

Field-level metrics: instrument each resolver with execution time tracking. Track p50, p95, and p99 latency per field. Identify fields that are slow, frequently erroring, or rarely used (candidates for deprecation). Implement this with a resolver wrapper that records start/end timestamps and reports to your metrics system.

Query complexity tracking: log the computed complexity score for each query alongside its execution time. This validates your complexity scoring model: if high-complexity queries are fast, your scoring overestimates; if low-complexity queries are slow, it underestimates.

Distributed tracing integration: propagate trace context through resolver execution. Each resolver span should show its parent field, execution time, and any downstream calls (database queries, HTTP calls to other services). In a federated architecture, trace spans across the gateway and all subgraphs to identify which subgraph is causing latency.

Client-side telemetry: instrument client-side query execution with cache hit rates, network latency, total render time, and error rates. Correlate client metrics with server metrics using operation signatures. This reveals issues like: the server resolves in 50ms but the client waits 2 seconds due to large response parsing on low-end devices.

Build dashboards answering key questions: What are our slowest operations? Which fields have the highest error rates? Which deprecated fields still have traffic? What is our p95 query latency trend? Which clients send the most expensive queries? For operations teams at companies like Meta and Google, these metrics are essential.

Implement alerting on: operation latency regression (p95 exceeds baseline by 50%), field error rate spikes, query rejection rate increases (complexity limit), and subscription connection count anomalies.

Follow-up questions:

  • How do you attribute GraphQL server cost to specific clients or teams?
  • What is your approach to performance budgets per query?
  • How do you use observability data to guide schema evolution decisions?

14. How do you handle file uploads in a GraphQL API?

What the interviewer is really asking: Do you understand the limitations of GraphQL for binary data and the pragmatic patterns for handling uploads without compromising the API's design integrity?

Answer framework:

GraphQL was designed for structured data, not binary transfers. The spec does not natively support file uploads. Three main approaches exist.

Approach 1: Separate upload endpoint. Use a traditional REST endpoint (or presigned URL) for the file upload, receive a file URL or ID, then use a GraphQL mutation to attach the file to an entity. This is the cleanest separation of concerns. Example: client uploads image to S3 via presigned URL, receives the URL, then calls a GraphQL mutation setProfilePhoto(url: "..."). Advantages: uses proven upload infrastructure, supports resumable uploads, avoids GraphQL server memory pressure. This is the approach recommended for production systems.

Approach 2: GraphQL multipart request spec. The graphql-upload library implements a spec for multipart form data that includes GraphQL operations. The file is sent as a multipart part alongside the query. The server maps the file stream to a resolver argument. Advantages: single request, feels native to GraphQL. Disadvantages: not all clients support it, complicates CDN/proxy configuration, loads entire files into server memory, and the library has had security vulnerabilities.

Approach 3: Base64 encoding in a mutation argument. Encode the file as a base64 string and pass it as a regular String argument. Simple but terrible for performance: 33% size overhead, entire file must be in memory, no streaming, and blocks the GraphQL execution engine.

The recommended production pattern is Approach 1 (presigned URLs) with a workflow: (1) client calls a mutation requestUploadUrl(filename, contentType, size) that returns a presigned URL and an upload ID, (2) client uploads directly to cloud storage using the presigned URL, (3) client calls a mutation completeUpload(uploadId) that triggers server-side validation (file type, size, virus scan), (4) the server marks the upload as ready and associates it with the entity.

This pattern supports: progress tracking, resumable uploads (using multipart upload protocols), direct-to-CDN upload (avoiding server bandwidth), and async processing (image resizing, transcoding). For large file handling at scale, understanding the system design patterns for distributed storage is essential.

Follow-up questions:

  • How do you handle upload progress in a GraphQL-native way?
  • What is your validation strategy for uploaded files?
  • How do you handle uploads in a federated architecture where the file storage is owned by a separate subgraph?

15. How do you test a GraphQL API comprehensively?

What the interviewer is really asking: Do you have a testing strategy that covers schema validation, resolver logic, integration behavior, and performance, accounting for GraphQL's unique characteristics?

Answer framework:

A comprehensive GraphQL testing strategy operates at four levels.

Level 1: Schema-level tests. Validate that the schema is syntactically valid and follows conventions. Use schema linting (graphql-eslint) to enforce naming conventions, deprecation usage, and documentation requirements. Test that schema changes are backward-compatible using graphql-inspector or similar tools in CI. Validate that the composed schema (in federation) compiles without conflicts.

Level 2: Resolver unit tests. Test individual resolvers in isolation. Mock dependencies (databases, external services) and verify that resolvers correctly transform data, handle errors, enforce authorization, and call dependencies with correct parameters. For resolvers using DataLoader, test that batching works correctly by verifying the batch function receives the expected keys.

Level 3: Integration tests. Execute full GraphQL queries against the running server with real (or test) databases. These tests verify the interaction between resolvers, DataLoaders, context setup, and error handling. Write tests for each significant operation: queries, mutations, subscriptions. Test both the happy path and error cases. Include tests for authorization (verify that users cannot access other users' data).

Level 4: Performance and load tests. Profile query execution time for key operations. Set performance budgets and fail CI if queries exceed thresholds. Load test with realistic query distributions (not just one query type). For subscription systems, test connection scaling and message throughput.

Additional testing concerns specific to GraphQL. Snapshot testing the schema: commit the generated SDL and fail CI if it changes unexpectedly (catches unintentional breaking changes). Test N+1 queries: instrument test runs to count database queries and assert that DataLoader batching is working. Test pagination edge cases: empty results, single results, exact page boundaries. Test error propagation: verify that field errors bubble up correctly through nullable boundaries.

For contract testing in federated architectures: each subgraph should test that it correctly implements the entities other subgraphs expect. Use composition checks in CI: fail the build if a subgraph change would break the composed schema.

Discuss test data management: use factories or builders to create test entities. Reset state between tests. For integration tests against real databases, use transactions that roll back after each test. For a structured approach to interview preparation including testing patterns, explore our learning paths.

Follow-up questions:

  • How do you test subscription resolvers?
  • What is your strategy for mocking external services in GraphQL integration tests?
  • How do you test that deprecated fields are no longer used by any client?

Common Mistakes in GraphQL Interviews

  1. Treating GraphQL as a database query language. GraphQL is a client-server protocol, not a database abstraction. Resolvers should encapsulate business logic and data access, not expose raw database capabilities. Saying you would expose arbitrary filters and sorts via GraphQL arguments reveals a misunderstanding of proper schema design.

  2. Ignoring the N+1 problem. If you describe a resolver architecture without mentioning DataLoader or batching, it signals a lack of production experience. Every senior GraphQL engineer has fought this battle.

  3. Not understanding federation trade-offs. Blindly recommending federation for every multi-service architecture shows a lack of nuance. Federation adds latency, operational complexity, and composition challenges. Justify it with organizational scale, not technical elegance.

  4. Dismissing caching concerns. Saying GraphQL cannot be cached reveals ignorance of the tooling ecosystem. Normalized client caches, CDN caching with persisted queries, and response caching are well-established patterns.

  5. Overlooking security implications. GraphQL's flexibility creates attack surfaces that do not exist in REST. Query depth bombs, complexity attacks, and introspection information leakage are real production concerns. Senior engineers must address these proactively.

How to Prepare for GraphQL Interviews

Build hands-on experience with a production-grade GraphQL server. Set up Apollo Server or Mercurius with a real database, implement DataLoader batching, add authentication middleware, and deploy with proper observability. Reading documentation is insufficient; you need to encounter and solve real problems.

Study the GraphQL specification directly. Understanding the execution algorithm, type system, and introspection system at the spec level distinguishes senior candidates from those who only know library APIs.

Review the architectures of public GraphQL APIs: GitHub's API (excellent mutation patterns, complexity scoring), Shopify's Storefront API (commerce-optimized schema design), and Yelp's API (search and geospatial patterns). These expose you to production schema design decisions.

Practice explaining trade-offs: schema-first vs code-first, federation vs monolith, subscriptions vs polling, REST vs GraphQL. Interviewers want to see that you can make and defend architectural decisions with concrete reasoning.

For a comprehensive preparation roadmap, explore our learning paths and system design interview guide. Understanding how GraphQL relates to alternatives like gRPC and REST demonstrates architectural breadth. Consider our pricing plans for structured interview preparation with expert guidance.

Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.