INTERVIEW_QUESTIONS
Serverless Interview Questions for Senior Engineers (2026)
Top serverless interview questions with detailed answer frameworks covering Lambda architecture, cold starts, event-driven patterns, cost optimization, and production-grade serverless design used at top technology companies.
Why Serverless Matters in Senior Engineering Interviews
Serverless computing has moved from an experimental deployment model to a foundational pillar of modern cloud architecture. At companies like Amazon, Google, Netflix, and Stripe, serverless is no longer confined to simple webhook handlers. It powers mission-critical event pipelines, real-time data processing, API backends serving millions of requests per second, and orchestration layers that coordinate dozens of downstream services. For senior and staff engineering candidates, demonstrating deep serverless expertise signals that you understand the operational and economic realities of running software at scale in 2026.
Interviewers asking serverless questions at the senior level are not looking for someone who can write a basic Lambda function. They want to see that you understand the execution model at a systems level, that you can reason about cold start latency in the context of p99 SLAs, that you know when serverless is the wrong choice, and that you can design event-driven architectures that are both resilient and cost-efficient. The ability to articulate trade-offs between serverless and container-based approaches, as explored in our Lambda vs Fargate comparison, is a strong differentiator.
This guide covers fifteen serverless interview questions that reflect what top companies actually ask in 2026. Each question includes the interviewer's true intent, a structured answer framework, and follow-up questions you should be prepared to handle. For broader preparation, see our system design interview guide and the learning paths tailored to senior engineers.
1. Explain the serverless execution model and how it differs fundamentally from container-based deployments.
What the interviewer is really asking: Do you understand serverless beyond the marketing pitch? Can you reason about the actual execution lifecycle, resource allocation, and the implications for application design?
Answer framework:
Start with the core abstraction: in serverless computing, the cloud provider manages server provisioning, scaling, and maintenance. The developer deploys a function or application unit, and the provider executes it in response to events. The fundamental difference from containers is the granularity of resource allocation and the lifecycle model.
In a container-based deployment using Kubernetes or ECS, you provision compute capacity that runs continuously. You pay for uptime regardless of whether requests are being processed. You control the operating system, runtime, and scaling configuration. The container starts once and handles many requests over its lifetime.
In serverless, the execution environment is ephemeral. Each function invocation gets an isolated execution context. The provider creates execution environments on demand, routes events to them, and reclaims resources when idle. AWS Lambda, for example, provisions a microVM using Firecracker, loads your code package, initializes the runtime, and then invokes your handler. The key insight is that the provider may reuse a warm execution environment for subsequent invocations, but your code must never depend on this behavior.
The implications for application design are significant. Statelessness is enforced rather than suggested. Any state must live in external services like DynamoDB, S3, or ElastiCache. Connection pooling behaves differently because environments are recycled. Cold starts introduce variable latency that does not exist in always-on containers. The concurrency model is fundamentally different: instead of a thread pool handling requests within a single process, each concurrent request gets its own execution environment.
Discuss the economic model: serverless pricing is per-invocation and per-millisecond of compute. For sporadic workloads, this is dramatically cheaper than provisioned containers. For sustained high-throughput workloads, containers often win on cost. The crossover point depends on request patterns, memory requirements, and execution duration. Understanding serverless architecture at this level separates senior candidates from those with surface-level knowledge.
Follow-up questions:
- How does Firecracker differ from traditional container runtimes and why does AWS use it for Lambda?
- At what request volume does serverless become more expensive than a container-based approach?
- How would you design a serverless application that needs to maintain WebSocket connections?
2. How do you diagnose and mitigate cold start latency in production serverless applications?
What the interviewer is really asking: Can you go beyond knowing that cold starts exist and demonstrate a systematic approach to measuring, understanding, and reducing their impact on real users?
Answer framework:
Begin by explaining what causes cold starts. When no warm execution environment is available, the provider must create one. This involves allocating compute resources, downloading the deployment package, starting the runtime, and running initialization code. On AWS Lambda, cold starts typically add 100ms to 1 second for lightweight runtimes like Node.js or Python, and 2 to 10 seconds for JVM-based runtimes without optimization.
The first step is measurement, not mitigation. Instrument your functions to distinguish cold invocations from warm ones. In AWS Lambda, check whether the execution environment is being reused by setting a global variable during initialization and checking it during invocation. Track cold start percentage and latency separately in your metrics. A function with 2 percent cold starts and a p99 latency of 3 seconds might be acceptable for an async processing pipeline but unacceptable for a synchronous API endpoint.
For mitigation, discuss a layered approach. First, reduce package size: smaller deployment packages download faster. Use tree-shaking, exclude dev dependencies, and for JVM applications use GraalVM native images or AWS SnapStart which snapshots the initialized JVM state. Second, increase memory allocation. On Lambda, more memory means more CPU, and the initialization phase runs faster. A function at 1792MB gets a full vCPU and initializes significantly faster than at 128MB. Third, use provisioned concurrency for latency-sensitive paths. This keeps a specified number of execution environments warm and initialized, eliminating cold starts entirely for those instances. The trade-off is cost since you pay for provisioned concurrency whether it is used or not.
Discuss architectural mitigations. For API Gateway backed Lambda functions, implement a tiered architecture where the API Gateway returns a cached response from a CDN or caching layer for read requests, bypassing Lambda entirely. For the write path, consider whether the client can tolerate async processing via an SQS queue, which eliminates the cold start from the user-facing latency.
Address the initialization code pattern: move heavy initialization (database connections, SDK clients, ML model loading) outside the handler function to the module scope. This code runs once during cold start and is reused across warm invocations. But be careful with connection limits since each execution environment opens its own connection.
Follow-up questions:
- How does provisioned concurrency interact with auto-scaling and what are the cost implications?
- What is AWS Lambda SnapStart and how does it reduce JVM cold starts?
- How would you handle a scenario where cold starts cause cascading timeouts in a synchronous microservices chain?
3. Design an event-driven order processing pipeline using serverless components.
What the interviewer is really asking: Can you architect a production-grade event-driven system using serverless primitives, handling ordering, idempotency, error handling, and observability?
Answer framework:
Start with the business flow: a customer places an order, which triggers payment processing, inventory reservation, notification, and fulfillment. Each step may fail independently and must be handled gracefully.
The architecture uses an event-driven approach with serverless components. When an order is placed, the API Gateway invokes a Lambda function that validates the order and publishes an OrderPlaced event to an SNS topic or EventBridge event bus. Downstream consumers subscribe to this event: a payment processing Lambda, an inventory Lambda, and a notification Lambda.
For ordering and coordination, use AWS Step Functions as an orchestrator rather than relying on choreography alone. Step Functions provide a state machine that coordinates the saga pattern: try payment, then reserve inventory, then send confirmation. If payment fails, the state machine invokes compensating actions (release inventory reservation). Step Functions handle retries, timeouts, and error states declaratively.
For idempotency, every Lambda function must be idempotent because events can be delivered more than once. Use DynamoDB conditional writes with an idempotency key (the order ID plus the processing step). Before processing, check if the idempotency record exists. If it does, return the cached result. If not, process the event and atomically write the result and idempotency record.
For error handling, configure dead letter queues (DLQ) on every SQS queue and Lambda event source mapping. When a message fails all retry attempts, it moves to the DLQ for manual investigation. Set up CloudWatch alarms on DLQ depth. For transient failures (downstream service unavailable), use exponential backoff with jitter in the retry configuration.
For observability, propagate a correlation ID through every event and function invocation. Use structured logging with the correlation ID, step name, and outcome. Push metrics to CloudWatch or a third-party monitoring system. Enable X-Ray tracing for distributed trace visualization across the entire pipeline. This is essential for debugging issues in production, similar to the monitoring approaches used in Netflix's architecture.
Discuss throughput: SQS provides virtually unlimited throughput, but downstream services (payment gateway, inventory database) have finite capacity. Use SQS reserved concurrency to limit the rate at which Lambda processes messages, preventing downstream overload.
Follow-up questions:
- How do you handle a scenario where the payment succeeds but the inventory reservation fails?
- How would you implement exactly-once processing semantics in this pipeline?
- What happens if the Step Functions execution itself fails midway through the saga?
4. How do you handle database connections in a serverless environment where thousands of concurrent Lambda functions may be running?
What the interviewer is really asking: Do you understand the connection exhaustion problem that is unique to serverless, and can you design solutions that balance performance with resource constraints?
Answer framework:
Explain the core problem: traditional connection pooling assumes a long-lived application process that opens N connections and reuses them across requests. In serverless, each execution environment maintains its own connection, and with thousands of concurrent Lambda invocations, you can exhaust database connection limits. A PostgreSQL instance typically supports 100 to 500 connections. If each Lambda invocation opens a connection, 500 concurrent invocations saturate the database.
The primary solution is a connection proxy. AWS RDS Proxy sits between Lambda and the database, maintaining a pool of connections to the database and multiplexing Lambda connections through this pool. Lambda functions connect to the proxy (which supports thousands of client connections) rather than directly to the database. The proxy holds a much smaller number of actual database connections (matching the database limit) and queues requests when all connections are in use.
Discuss how RDS Proxy works internally: it uses connection pinning to associate a client session with a backend connection for the duration of a transaction. Between transactions, the backend connection returns to the pool. This means your Lambda code should use short transactions and avoid session-level state (temporary tables, SET commands) that would force prolonged pinning.
For DynamoDB and other serverless-native databases, connection management is handled by the SDK using HTTP connections rather than persistent TCP sockets. This eliminates the connection pooling problem entirely, which is one reason serverless architectures often pair Lambda with DynamoDB.
Discuss alternatives for non-AWS databases: use connection pooling services like PgBouncer deployed on a small EC2 instance or Fargate container. For Redis, use ElastiCache Serverless which handles connection management automatically.
Address the initialization pattern: open the database connection outside the handler function. This way, warm invocations reuse the existing connection. Implement connection health checks that verify the connection is still valid before use, since long-idle connections may be terminated by the database or network infrastructure.
Follow-up questions:
- How does RDS Proxy handle transaction isolation across multiplexed connections?
- What database choices would you make for a greenfield serverless application and why?
- How do you monitor and alert on connection pool exhaustion in a serverless context?
5. Compare the serverless offerings across AWS, GCP, and Azure for a multi-cloud strategy.
What the interviewer is really asking: Do you have breadth of knowledge across cloud providers, and can you reason about vendor lock-in, portability, and the practical differences that affect architectural decisions?
Answer framework:
Structure the comparison across several dimensions: compute, event sources, orchestration, observability, and ecosystem maturity. This directly relates to the broader AWS vs GCP vs Azure comparison.
For compute, AWS Lambda supports up to 10GB memory and 15-minute execution timeout. Google Cloud Functions (2nd gen, built on Cloud Run) supports up to 32GB memory and 60-minute timeout. Azure Functions supports up to 14GB memory and unbounded execution time for premium plans. Lambda has the most mature cold start optimization (SnapStart, provisioned concurrency). Cloud Functions 2nd gen leverages Cloud Run's container infrastructure, offering more flexibility but slightly different cold start characteristics. Azure Functions offers a unique Durable Functions framework for stateful orchestration.
For event sources, AWS has the deepest integration: over 200 event sources including S3, DynamoDB Streams, Kinesis, SQS, SNS, EventBridge, API Gateway, ALB, IoT Core, and Cognito. GCP integrates with Cloud Storage, Pub/Sub, Firestore, and Eventarc. Azure integrates with Blob Storage, Event Grid, Service Bus, Cosmos DB, and Event Hubs. AWS's breadth of event sources is a significant advantage for complex event-driven architectures.
For orchestration, AWS Step Functions provides visual state machines with built-in retry and error handling. GCP Cloud Workflows offers similar functionality with YAML-based workflow definitions. Azure Durable Functions uses a code-first approach where the orchestration logic is written in the same language as the functions.
For portability, discuss the Serverless Framework and similar tools that abstract provider differences. In practice, true portability is difficult because the value of serverless comes from deep integration with provider-specific services. A Lambda function triggered by DynamoDB Streams processing events into Kinesis cannot be trivially ported to GCP. For organizations requiring multi-cloud, consider using containers on Kubernetes with Knative as a serverless abstraction layer that runs on any Kubernetes cluster.
Discuss pricing models: Lambda charges per request and per GB-second. Cloud Functions charges similarly but with a free tier that is slightly different. Azure Functions consumption plan charges per execution and per GB-second with a generous free monthly grant.
Follow-up questions:
- If you had to build a serverless application that could run on any cloud, how would you architect it?
- What are the practical implications of Lambda's 15-minute timeout versus Cloud Functions' 60-minute timeout?
- How do the cold start characteristics differ across these three providers?
6. How do you design a serverless application for cost optimization at scale?
What the interviewer is really asking: Can you think beyond the simplistic pay-per-use narrative and understand the nuanced cost dynamics of serverless at high volume, including hidden costs that can dwarf compute charges?
Answer framework:
Start by identifying the actual cost components. Compute cost (Lambda GB-seconds) is often the most visible but not always the largest. API Gateway costs $3.50 per million requests. Data transfer costs accumulate when functions communicate with services across availability zones or regions. CloudWatch Logs costs $0.50 per GB ingested, and verbose logging from thousands of concurrent functions generates significant volume. DynamoDB read/write capacity, S3 request costs, and SNS/SQS per-message charges all add up.
For compute optimization, right-size memory allocation. Use AWS Lambda Power Tuning (an open-source tool) to find the optimal memory setting for each function. Often, increasing memory reduces execution time proportionally, resulting in the same or lower cost with better performance. For example, a function at 128MB running for 1000ms costs the same as the same function at 256MB running for 500ms, but the 256MB version provides better user experience.
For invocation optimization, reduce unnecessary invocations. Batch processing with SQS allows a single Lambda invocation to process up to 10,000 messages. Use EventBridge content-based filtering to route only relevant events to each function rather than filtering inside the function. Implement caching at the API Gateway level to avoid invoking Lambda for repeated identical requests.
For architecture-level optimization, evaluate whether high-throughput steady-state workloads should run on Fargate instead of Lambda. A function processing 100M requests per month at 200ms average duration might cost significantly more on Lambda than on a small Fargate service. Use Lambda for spiky, unpredictable workloads and Fargate for sustained baseline load.
Discuss logging costs: implement structured logging with configurable log levels. In production, log at WARN or ERROR level by default and enable DEBUG dynamically when investigating issues. Use CloudWatch Logs subscription filters to extract metrics rather than retaining all log data. Set retention policies to delete old logs automatically.
Address the cost of idle provisioned concurrency. If you provision 100 instances at 512MB, you pay approximately $4.50 per hour regardless of utilization. Use scheduled scaling to reduce provisioned concurrency during off-peak hours. Monitor the ratio of provisioned to on-demand invocations to ensure you are not over-provisioning.
Follow-up questions:
- How would you build a cost attribution model for a serverless application shared across multiple teams?
- What strategies would you use to predict serverless costs before deployment?
- How do reserved concurrency and provisioned concurrency affect cost differently?
7. How do you implement authentication and authorization in a serverless API?
What the interviewer is really asking: Do you understand the security model specific to serverless APIs, including the various authorizer types, token validation strategies, and the principle of least privilege for function execution roles?
Answer framework:
For API Gateway-backed Lambda functions, discuss the three authorizer types. Lambda authorizers (formerly custom authorizers): a dedicated Lambda function receives the request token, validates it (verifying JWT signature, checking expiration, looking up permissions), and returns an IAM policy document that API Gateway caches for a configurable TTL. This is the most flexible approach and supports any token format.
Cognito authorizers: API Gateway natively validates JWT tokens issued by Amazon Cognito user pools. No custom code needed for basic authentication. However, authorization logic (role-based access) still requires implementation in the downstream Lambda function or a Lambda authorizer.
IAM authorization: the client signs requests using AWS SigV4. Best for service-to-service communication where both sides have AWS credentials. Used heavily in internal microservices architectures.
For JWT validation in a Lambda authorizer, use the provider's JWKS (JSON Web Key Set) endpoint to fetch public keys. Cache the JWKS in the execution environment (module-level variable) to avoid fetching on every invocation. Validate the token signature, issuer, audience, and expiration. Extract claims (user ID, roles, permissions) and construct the authorization policy.
For fine-grained authorization, discuss the difference between authentication (who is the user) and authorization (what can they do). Implement RBAC (Role-Based Access Control) or ABAC (Attribute-Based Access Control) in the Lambda function. For complex authorization logic, consider a dedicated authorization service (like OPA or Cedar) that the Lambda function queries.
For IAM execution roles, apply the principle of least privilege. Each Lambda function should have its own IAM role with only the permissions it needs. A function that reads from DynamoDB and writes to S3 should not have permissions to invoke other Lambda functions or access unrelated resources. Use IAM condition keys to restrict access to specific DynamoDB tables and S3 prefixes.
Discuss secrets management: never embed API keys or database credentials in Lambda code or environment variables in plaintext. Use AWS Secrets Manager or SSM Parameter Store. Cache secrets in the execution environment with a TTL to reduce API calls.
Follow-up questions:
- How does API Gateway authorizer caching work and what are the security implications of caching authorization decisions?
- How would you implement rate limiting per API key in a serverless API?
- How do you handle authorization for WebSocket APIs differently from REST APIs?
8. Design a serverless data processing pipeline that ingests, transforms, and loads terabytes of data daily.
What the interviewer is really asking: Can you apply serverless to data engineering at scale, understanding the boundaries of Lambda's execution model and when to combine serverless with other compute models?
Answer framework:
Define the pipeline stages: ingestion, transformation, enrichment, and loading. The key constraint is Lambda's 15-minute timeout and 10GB memory limit, which means individual files larger than what can be processed in that time and memory must be split.
For ingestion, files land in S3 (via direct upload, Kinesis Firehose, or partner data feeds). S3 event notifications trigger a Lambda function for each new object. For high-volume ingestion (thousands of files per second), use S3 event notifications to SQS rather than direct Lambda invocation. SQS provides buffering and allows you to control the processing rate with reserved concurrency.
For transformation, the Lambda function reads the file from S3, applies transformations (parsing, validation, schema normalization, PII redaction), and writes the result to a staging location in S3. For files that exceed Lambda's capacity, use a fan-out pattern: a coordinator Lambda splits the file into chunks and triggers a worker Lambda per chunk. Use Step Functions to coordinate the fan-out and fan-in, waiting for all workers to complete before triggering the load step.
For enrichment (joining with reference data, geocoding, entity resolution), the Lambda function queries DynamoDB or an API for enrichment data. Cache frequently accessed reference data in the execution environment. For expensive enrichment operations, batch requests to downstream services.
For loading, write transformed data in Parquet format to an S3-based data lake partitioned by date. Trigger a Glue Crawler or directly update Athena table partitions. For loading into a data warehouse like Redshift, use the COPY command via a Lambda function or Redshift Data API.
Discuss error handling at each stage. Use SQS DLQ for failed records. Implement circuit breakers for downstream service failures. For partial failures (some records in a batch fail), separate successful records from failures rather than retrying the entire batch.
For monitoring, track pipeline throughput (records per second), latency (time from ingestion to availability), error rate, and data quality metrics. Alert on pipeline anomalies like sudden drops in throughput or spikes in error rate.
Address cost comparison: for a pipeline processing 1TB daily with average file size of 10MB, estimate the number of Lambda invocations, execution time, and compare with a Fargate or EMR Serverless alternative. Serverless excels when the ingestion pattern is bursty, but a steady stream might be cheaper on containers.
Follow-up questions:
- How would you handle schema evolution in the pipeline without downtime?
- What happens if a source system sends duplicate records and how do you ensure exactly-once processing?
- How would you implement backfill processing for historical data reprocessing?
9. How do you implement observability in a serverless application where you have no access to the underlying infrastructure?
What the interviewer is really asking: Can you build comprehensive observability without traditional infrastructure metrics, using the three pillars of metrics, logs, and traces in a serverless context?
Answer framework:
The observability challenge in serverless is unique: you cannot install agents on servers, you cannot access system-level metrics like CPU or memory utilization at the OS level, and the ephemeral nature of execution environments makes traditional APM approaches difficult.
For metrics, use CloudWatch embedded metrics format (EMF) to publish custom metrics directly from Lambda function logs. This is more cost-effective than calling the CloudWatch PutMetricData API. Track business metrics (orders processed, payments completed), technical metrics (function duration by percentile, cold start ratio, error rate by error type), and integration metrics (downstream service latency, connection pool utilization). Use CloudWatch Contributor Insights to identify the top-N most invoked functions, slowest functions, and most error-prone functions.
For structured logging, adopt a consistent JSON log format across all functions including timestamp, request ID, correlation ID, function name, log level, and the message. Use Lambda Powertools (available for Python, TypeScript, Java, and .NET) which provides structured logging, custom metrics, and tracing out of the box. Include the AWS X-Ray trace ID in every log entry for correlation. Forward logs to a centralized log aggregation service using CloudWatch Logs subscription filters to Kinesis Firehose to S3 or a SIEM.
For distributed tracing, enable AWS X-Ray on all Lambda functions and API Gateway. X-Ray traces a request across API Gateway, Lambda, DynamoDB, SQS, SNS, and other AWS services, showing the latency contribution of each component. Annotate traces with business context (order ID, customer tier, operation type) so you can search for traces by business attributes. For functions that call external services, use the X-Ray SDK to instrument outbound HTTP calls.
Discuss the correlation ID pattern in depth. Generate a unique correlation ID at the entry point (API Gateway or the first Lambda in a chain). Propagate it through all events, messages, and function invocations. When investigating an issue, search logs by correlation ID to reconstruct the entire request flow across all services. This is analogous to how Uber's architecture traces ride requests across dozens of microservices.
For alerting, create CloudWatch alarms on error rates, duration anomalies, throttling, and DLQ depth. Use composite alarms to reduce noise. Set up a dashboard that shows the health of the entire serverless application at a glance.
Follow-up questions:
- How do you debug a Lambda function that fails intermittently but works most of the time?
- How would you implement canary deployments for Lambda functions with automated rollback based on error rate?
- What is the cost of comprehensive observability and how do you balance coverage with expense?
10. How do you handle long-running workflows in a serverless architecture that exceed Lambda's execution timeout?
What the interviewer is really asking: Do you understand the boundaries of serverless compute and can you design patterns that work within those constraints for complex business processes?
Answer framework:
Lambda's maximum execution time is 15 minutes. Many real-world workflows take longer: video transcoding, large data migrations, ML model training, multi-step approval processes that wait for human input, and batch processing jobs that span hours.
The primary solution for orchestrated workflows is AWS Step Functions. Step Functions support workflows that run for up to one year. The state machine coordinates multiple Lambda invocations, waits for callbacks, handles retries, and maintains workflow state durably. Standard Workflows are priced per state transition (suitable for long-running, low-volume workflows). Express Workflows are priced per invocation and duration (suitable for high-volume, short-duration workflows).
Design patterns for long-running processes. The continuation pattern: a Lambda function processes a batch of records, checkpoints its progress to DynamoDB, and if time is running out (check context.getRemainingTimeInMillis()), it invokes itself asynchronously with the checkpoint position. The next invocation resumes from where the previous one stopped. This is simple but has drawbacks: it is harder to monitor and debug than Step Functions.
For human-in-the-loop workflows (approval processes, manual review steps), use Step Functions with task tokens. The state machine pauses at an activity or callback state, generating a task token. A notification is sent to the human reviewer. When they approve or reject, an API call sends the result with the task token, and the state machine resumes. The workflow can wait for days or weeks.
For compute-intensive long-running jobs that need more than 15 minutes of continuous compute (video processing, ML training), serverless may not be the right compute model. Use Fargate for tasks up to hours or EC2 for even longer. But orchestrate these tasks using Step Functions. The state machine invokes a Fargate task, waits for completion using a callback pattern, and then continues with Lambda functions for subsequent steps.
Discuss the scatter-gather pattern for parallelizable work: a coordinator Lambda splits the work into chunks and uses Step Functions Map state to process chunks in parallel (up to 10,000 concurrent branches). Each branch runs a Lambda function. The Map state waits for all branches to complete before proceeding.
For periodic long-running processes (nightly batch jobs), use EventBridge Scheduler to trigger a Step Functions execution at a scheduled time. The state machine orchestrates the entire batch job with proper error handling, retries, and notifications.
Follow-up questions:
- How do Step Functions Standard Workflows and Express Workflows differ and when would you choose each?
- What happens if a Step Functions execution fails midway and how do you implement recovery?
- How would you migrate a monolithic batch job running on EC2 to a serverless architecture?
11. How do you design serverless applications for multi-region active-active deployment?
What the interviewer is really asking: Can you extend serverless architecture beyond a single region, handling data replication, request routing, and consistency challenges in a globally distributed serverless application?
Answer framework:
Multi-region serverless deployment requires addressing three challenges: compute deployment, data replication, and traffic routing.
For compute, deploy the same Lambda functions to multiple regions using infrastructure as code (CloudFormation StackSets, Terraform, or CDK Pipelines with cross-region deployment stages). Use a CI/CD pipeline that deploys to a primary region first, runs integration tests, and then deploys to secondary regions. API Gateway endpoints are regional by default. Create a regional API in each target region.
For traffic routing, use Route 53 with latency-based routing or geolocation routing. Latency-based routing directs users to the region with the lowest network latency. Geolocation routing directs users based on their geographic location. Use health checks to automatically failover traffic away from an unhealthy region.
For data replication, this is the hardest challenge. DynamoDB Global Tables provide fully managed multi-region, multi-active replication with last-writer-wins conflict resolution. Writes in any region are replicated to all other regions within seconds. For use cases requiring stronger consistency, you need application-level conflict resolution. S3 Cross-Region Replication handles file data. For relational databases, Aurora Global Database provides cross-region read replicas with promotion capability for failover.
Discuss consistency trade-offs using the CAP theorem perspective. In an active-active configuration with DynamoDB Global Tables, you get eventual consistency between regions. A user who writes in us-east-1 and immediately reads in eu-west-1 might see stale data. For most applications this is acceptable if the replication lag is under one second. For applications requiring strict consistency (financial transactions), designate a primary region for writes and route all writes there, while serving reads from the nearest region.
For event-driven architectures, EventBridge Global Endpoints provide automatic failover for event buses. SQS does not natively replicate across regions, so if you need cross-region message processing, publish to SNS which can fan out to SQS queues in multiple regions.
Address operational concerns: centralize logging and metrics from all regions into a single observability platform. Use a global dashboard showing health and performance per region. Implement automated failover runbooks that can redirect traffic within seconds of detecting a regional outage.
Follow-up questions:
- How do you handle deployments across regions without causing inconsistencies during the rollout window?
- What data conflicts can arise with DynamoDB Global Tables and how would you handle them?
- How do you test multi-region failover without affecting production users?
12. Explain the security model for serverless applications and how it differs from traditional application security.
What the interviewer is really asking: Do you understand that serverless shifts but does not eliminate security responsibilities, and can you articulate the shared responsibility model, common attack vectors, and defense strategies specific to serverless?
Answer framework:
In serverless, the shared responsibility model shifts significantly. The provider handles OS patching, runtime updates, network security of the execution environment, and physical security. You are responsible for application code security, IAM permissions, data encryption, input validation, dependency management, and configuration.
The biggest serverless-specific attack vector is event injection. In traditional applications, input comes through well-defined HTTP endpoints with standard input validation middleware. In serverless, functions are triggered by diverse event sources: S3 object keys (which can contain malicious content in the filename), SQS message bodies, DynamoDB stream records, and API Gateway requests. Each event source has a different payload format. Every event source is an attack surface that needs input validation. A common vulnerability is using S3 object keys directly in SQL queries or shell commands without sanitization.
For IAM security, each Lambda function should have a dedicated IAM role with minimal permissions. Avoid using a shared role across functions. Use IAM policy conditions to restrict access to specific resources (specific DynamoDB tables, specific S3 prefixes). Enable resource-based policies on Lambda functions to control which services and accounts can invoke them. Audit permissions regularly using IAM Access Analyzer.
For dependency security, serverless applications rely heavily on third-party packages. A compromised npm or PyPI package in your Lambda function can exfiltrate environment variables (which may contain secrets), make outbound API calls, or access any AWS service that the function's IAM role allows. Use tools like Snyk, Dependabot, or AWS Inspector to scan dependencies for known vulnerabilities. Pin dependency versions and use lock files.
For data security, encrypt all data at rest (S3 SSE, DynamoDB encryption, Secrets Manager encryption) and in transit (enforce TLS). Use VPC-attached Lambda functions when accessing resources in a private VPC, but understand the cold start penalty this historically imposed (largely mitigated since 2019 with Hyperplane ENI improvements).
For runtime security, consider using Lambda extensions for real-time security monitoring. These run as separate processes in the execution environment and can monitor function behavior, detect anomalies, and enforce security policies without modifying function code. Companies like Amazon and Google invest heavily in these runtime protections for their own serverless platforms.
Follow-up questions:
- How do you handle secrets rotation in a serverless environment without redeploying functions?
- What is the OWASP Serverless Top 10 and how does it differ from the traditional OWASP Top 10?
- How would you implement a Web Application Firewall for a serverless API?
13. How do you test serverless applications, from unit tests to integration tests to end-to-end tests?
What the interviewer is really asking: Can you implement a comprehensive testing strategy that accounts for the unique challenges of testing event-driven, cloud-integrated serverless applications?
Answer framework:
Serverless testing is challenging because functions are deeply integrated with cloud services. You cannot fully test a Lambda function that reads from DynamoDB, writes to S3, and publishes to SNS without either using real AWS services or creating convincing mocks.
For unit testing, isolate your business logic from the Lambda handler and AWS SDK calls. Structure your code so the handler function parses the event, calls a pure business logic function, and then interacts with AWS services. Test the business logic function with standard unit testing frameworks (Jest, pytest, JUnit). Mock AWS SDK calls using libraries like aws-sdk-mock (Node.js), moto (Python), or localstack.
For integration testing, use real AWS services in a dedicated test environment. Deploy the serverless application to a test stage and run tests that invoke actual Lambda functions, write to actual DynamoDB tables, and verify end-to-end behavior. Use tools like SST (Serverless Stack) which provides a live development mode that runs Lambda functions locally but connects to real AWS resources. This gives fast iteration cycles while testing against real infrastructure.
For contract testing, when your Lambda function is part of a microservices architecture, use Pact or similar tools to verify that the event schemas between services are compatible. Define the expected event format at each boundary and verify that producers and consumers agree.
For load testing, use tools like Artillery or k6 to generate realistic traffic patterns against your serverless API. Measure cold start ratio under load, observe how concurrency scaling behaves, and verify that downstream services (databases, external APIs) handle the concurrent load. Pay attention to Lambda throttling: the default regional concurrency limit is 1000. Request a limit increase before load testing.
For chaos testing, use AWS Fault Injection Simulator to inject failures into your serverless infrastructure: increase Lambda function latency, throttle DynamoDB reads, or disrupt SQS message delivery. Verify that your application degrades gracefully.
Discuss the testing pyramid for serverless: the base should be unit tests of business logic (fast, many). The middle layer is integration tests with real AWS services (slower, focused on critical paths). The top is end-to-end tests (slowest, fewest, covering user journeys). As described in our distributed systems guide, testing distributed architectures requires deliberate investment in infrastructure.
Follow-up questions:
- How do you test Step Functions state machines locally?
- What is the role of LocalStack in serverless testing and what are its limitations?
- How do you implement test data management for integration tests that use real DynamoDB tables?
14. When should you NOT use serverless? What are the anti-patterns?
What the interviewer is really asking: Can you demonstrate the maturity to recognize when serverless is the wrong tool, showing that your recommendations are driven by requirements rather than technology preference?
Answer framework:
Serverless is not universally optimal. Understanding its limitations is as important as understanding its strengths, and this is what separates senior engineers from enthusiasts.
Anti-pattern one: sustained high-throughput workloads. A service handling 10,000 requests per second consistently, 24/7, will almost certainly be cheaper on containers. Serverless pricing is linear with invocations. Containers have a fixed base cost that is amortized across requests. At high, steady utilization, the per-request cost on Fargate or ECS can be 3 to 5 times lower than Lambda. Use Lambda vs Fargate analysis to make data-driven decisions.
Anti-pattern two: latency-sensitive applications requiring sub-10ms response times. Even warm Lambda invocations add 1 to 5ms of overhead from the Lambda service routing layer. For ultra-low-latency use cases (real-time bidding, high-frequency trading, game servers), the overhead is unacceptable. Use dedicated instances or containers.
Anti-pattern three: long-running compute tasks. Video transcoding, ML model training, and large batch processing jobs that need continuous compute for hours do not fit Lambda's 15-minute timeout. You can work around this with Step Functions, but the complexity may not be worth it compared to running a Fargate task.
Anti-pattern four: workloads requiring significant local state. If your application needs to maintain a large in-memory cache, a persistent connection pool, or GPU access, serverless is not appropriate. Lambda execution environments are ephemeral and limited in memory (10GB max).
Anti-pattern five: applications with heavy binary dependencies. If your function requires a large runtime (custom ML frameworks, video processing libraries), the deployment package size limit (250MB unzipped, or 10GB with container image deployment) and the cold start latency of loading these dependencies become problematic.
Anti-pattern six: vendor lock-in sensitivity. Serverless applications are deeply integrated with provider-specific services. If your organization requires cloud portability, a container-based approach with Kubernetes provides better portability.
Discuss the hybrid approach: use serverless for event-driven glue logic, API handlers, and scheduled tasks while using containers for sustained compute, stateful services, and latency-critical paths. Most production architectures at companies like Amazon use both models together.
Follow-up questions:
- How would you make the business case for migrating a containerized service to serverless or vice versa?
- What metrics would you use to continuously evaluate whether serverless is the right choice for a particular workload?
- How do you handle organizational resistance to serverless from teams accustomed to managing their own infrastructure?
15. Design a serverless architecture for a real-time event processing system handling 1 million events per second.
What the interviewer is really asking: Can you push serverless to its limits, understand where those limits are, and design a hybrid architecture that leverages serverless strengths while compensating for its weaknesses at extreme scale?
Answer framework:
At 1 million events per second, you are operating at a scale where naive serverless design breaks down. This requires careful architectural decisions at every layer.
For ingestion, Amazon Kinesis Data Streams can handle millions of records per second across multiple shards. Each shard supports 1,000 records per second for writes and 2MB per second. For 1M events per second, provision 1,000 or more shards. Alternatively, use Amazon MSK (Managed Streaming for Apache Kafka) which provides higher per-partition throughput. The choice between Kinesis and Kafka depends on existing team expertise and the broader ecosystem requirements.
For processing, Lambda can consume from Kinesis using the event source mapping with parallelization factor up to 10 per shard. With 1,000 shards and parallelization factor 10, you get 10,000 concurrent Lambda invocations processing the stream. Each invocation receives a batch of records (configurable batch size and batch window). The Lambda function processes the batch, performs transformations, enrichment, and routing.
However, at this scale, Lambda's regional concurrency limit (default 1,000, can be increased to tens of thousands) becomes a constraint. Request a limit increase well in advance. Also consider the cost: 10,000 concurrent Lambda invocations each running continuously will be expensive. For the stream processing layer, Fargate or a dedicated Flink cluster on EMR Serverless might be more cost-effective for the sustained compute portion.
The recommended hybrid architecture: use Kinesis or MSK for ingestion, Apache Flink on Amazon Managed Service for Apache Flink (formerly Kinesis Data Analytics) for the high-throughput stream processing (windowed aggregations, complex event processing, stateful transformations), and Lambda for the event-driven downstream actions (sending notifications, updating DynamoDB, triggering workflows).
For the storage layer, write processed events to S3 in Parquet format for the data lake, DynamoDB for real-time lookups, and ElastiCache for hot data. Use Kinesis Firehose for the S3 delivery with automatic batching, compression, and partitioning.
For fault tolerance, handle poison pills (malformed events that cause processing failures) by routing them to a DLQ after configurable retry attempts. Implement checkpointing so that if a processor fails, it resumes from the last checkpoint rather than reprocessing the entire stream. Monitor iterator age (the age of the oldest unprocessed record) as the primary health metric. Alert if it exceeds your latency SLA, which might indicate that processing is falling behind ingestion.
Address back-pressure: if downstream services (DynamoDB, external APIs) cannot keep up with the processing rate, buffer writes in SQS and process them at a controlled rate using Lambda reserved concurrency. This prevents cascade failures and is similar to how Netflix handles traffic spikes in their event pipeline.
Follow-up questions:
- How would you handle exactly-once processing semantics at this scale?
- What happens during a Kinesis shard split or merge and how does it affect processing?
- How would you implement real-time anomaly detection on this event stream using serverless components?
Common Mistakes in Serverless Interviews
-
Treating serverless as a silver bullet. Failing to acknowledge limitations or anti-patterns signals inexperience. Always discuss when serverless is not the right choice and what alternatives you would consider.
-
Ignoring cold start implications. Dismissing cold starts as a solved problem shows you have not operated serverless at scale. Discuss measurement, mitigation strategies (provisioned concurrency, SnapStart, architecture changes), and how cold starts affect p99 latency.
-
Not understanding the cost model beyond compute. Lambda invocation cost is just one component. API Gateway, data transfer, logging, DynamoDB capacity, and S3 request costs can exceed compute costs. Show that you can estimate total cost of ownership.
-
Designing stateful Lambda functions. Assuming the execution environment will be reused or storing state in the /tmp directory across invocations is a common mistake. Design for statelessness and use external state stores.
-
Neglecting observability. Without servers to SSH into, comprehensive observability is critical. Not having a strategy for metrics, structured logging, and distributed tracing in a serverless context is a red flag.
How to Prepare for Serverless Interviews
Build real serverless applications. Deploy a multi-function application with API Gateway, Lambda, DynamoDB, SQS, and Step Functions. Intentionally break things: trigger cold starts, exceed concurrency limits, introduce downstream failures, and observe how the system behaves. This hands-on experience is irreplaceable.
Study how serverless works at the infrastructure level. Understand Firecracker, the Lambda execution model, and how provisioned concurrency actually works. Read the AWS Lambda Operator Guide and the Well-Architected Serverless Applications Lens.
Practice cost estimation. Given a workload description (request rate, execution duration, memory, data transfer), estimate the monthly cost on Lambda versus Fargate versus EC2. Use the AWS Pricing Calculator to verify your estimates.
Study serverless patterns from the learning paths: saga pattern, fan-out/fan-in, event sourcing, CQRS, and the strangler fig migration pattern. Understand when each pattern applies and its trade-offs.
For a complete interview preparation strategy, see our system design interview guide and review pricing plans for access to all practice materials.
Related Resources
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.