How serverless architecture works — cold starts, event-driven execution, cost models, and when serverless saves money vs when it becomes expensive.

Serverless Architecture

Serverless architecture is a cloud execution model where the cloud provider dynamically allocates compute resources per request, charging only for actual execution time rather than reserved capacity.

What It Really Means

Serverless does not mean "no servers." It means you do not manage, provision, or think about servers. You write functions, deploy them, and the cloud provider handles everything else: scaling, OS patching, load balancing, availability zones, and capacity planning. You pay per invocation and per millisecond of execution — not per hour of a running server.

The model fundamentally changes how you think about infrastructure costs. A traditional server costs money 24/7 whether it handles zero requests or ten thousand. A serverless function costs nothing when idle and scales automatically under load. For sporadic workloads — webhooks, scheduled jobs, event processors — this can reduce costs by 90% or more.

But serverless introduces constraints. Functions have execution time limits (15 minutes on AWS Lambda). Cold starts add latency to the first invocation. Statelessness is enforced — you cannot store data in memory between invocations. And at high, steady throughput, serverless becomes more expensive than reserved instances because you pay a premium for the elasticity.

How It Works in Practice

The Execution Model

Cold Start Impact by Runtime

Runtime	Typical Cold Start	Warm Invocation
Python	200-500ms	< 5ms
Node.js	200-400ms	< 5ms
Go	50-100ms	< 1ms
Java	1-5 seconds	< 5ms
.NET	500ms-2s	< 5ms

Real System: Image Processing Pipeline

This pipeline scales from 0 to 1,000 concurrent image processing tasks without any capacity planning. It costs nothing when no images are uploaded.

Implementation

AWS Lambda function (Python):

python

Infrastructure as Code (AWS SAM):

yaml

Reducing cold starts with provisioned concurrency:

yaml

Trade-offs

Benefits:

Zero cost at zero traffic
Automatic scaling from 0 to thousands of concurrent executions
No server management, patching, or capacity planning
Pay-per-use pricing aligns cost with business value
Reduced operational overhead — no SSH, no monitoring of OS metrics

Costs:

Cold start latency (problematic for user-facing APIs)
Execution time limits (15 min on Lambda)
Vendor lock-in (Lambda, Cloud Functions, Azure Functions have different APIs)
Debugging and observability are harder (distributed traces required)
More expensive than reserved instances at steady high throughput

Cost crossover point:

When to use serverless:

Event-driven processing (file uploads, webhooks, queue consumers)
APIs with unpredictable or bursty traffic patterns
Scheduled jobs and cron tasks
Prototypes and MVPs where speed-to-market matters

When to avoid serverless:

Low-latency APIs where cold starts are unacceptable
Long-running processes (video transcoding, ML training)
Steady high-throughput workloads (cheaper on containers)
Applications with large deployment artifacts (long cold starts)

Common Misconceptions

"Serverless is always cheaper" — At steady high throughput, reserved instances cost 50-80% less. Serverless pricing favors sporadic workloads.
"Cold starts are always a problem" — For async workloads (queue consumers, event processors), cold start latency is irrelevant. It only matters for synchronous user-facing APIs.
"Serverless means stateless" — Functions are stateless between invocations, but you use external state stores (DynamoDB, S3, Redis). The architecture is stateless, not the application.
"You cannot run complex applications on serverless" — Step Functions orchestrate multi-step workflows. API Gateway routes to multiple functions. The application can be arbitrarily complex — each function is simple.
"Serverless eliminates DevOps" — It shifts operations from servers to configuration: IAM policies, API Gateway settings, VPC configurations, monitoring, and deployment pipelines still need expertise.

How This Appears in Interviews

"Design a file processing pipeline" — Serverless is the ideal answer: S3 trigger, Lambda processing, SQS for fan-out. Explain why this is better than a polling server.
"Your API has traffic spikes of 100x during sales events" — Serverless scales automatically. Discuss provisioned concurrency for consistent latency and SQS buffering for downstream protection.
"Compare serverless vs containers" — Serverless: zero management, pay-per-use, cold starts. Containers: full control, consistent latency, cheaper at scale. Use both: serverless for glue logic, containers for core services.
"How do you handle a 30-minute data processing job?" — Lambda has a 15-min limit. Use Step Functions to chain multiple Lambda invocations or switch to ECS/Fargate for long-running tasks.

Related Concepts

Pub-Sub Pattern — event-driven communication between serverless functions
Bulkhead Pattern — concurrency limits on Lambda act as bulkheads
Retry with Exponential Backoff — essential for handling transient failures in serverless
Twelve-Factor App — serverless naturally enforces many twelve-factor principles
System Design Interview Guide
Algoroq Pricing — access all concept deep-dives

Serverless Architecture Explained: When Functions Replace Servers