Blog / Architecture
Architecture

Zero-Downtime Deployments: Blue-Green, Canary, and Rolling Strategies

Deployment strategies for zero downtime with Kubernetes examples, database migration patterns, feature flags, and rollback procedures.

Akhil Sharma

Akhil Sharma

March 9, 2026

10 min read

Zero-Downtime Deployments: Blue-Green, Canary, and Rolling Strategies

Deploying without downtime isn't just about the deployment strategy — it's about how your application handles the transition between versions. Database migrations, in-flight requests, and connection draining all need consideration. The deployment strategy is the easy part.

Rolling Deployments

The default in Kubernetes. Old pods are gradually replaced with new pods. At any point during the deployment, both old and new versions serve traffic.

yaml

Key settings:

  • maxUnavailable: 0 ensures capacity never drops below the desired replica count
  • readinessProbe prevents traffic from hitting pods that aren't ready
  • preStop hook gives in-flight requests time to complete before the pod shuts down

Risk: Both versions serve traffic simultaneously during the rollout. If v2 has a breaking change, some users see v2 while others see v1. This requires backward-compatible changes.

Blue-Green Deployments

Run two identical environments. "Blue" is the current production environment. "Green" is the new version. Once green passes health checks, switch all traffic from blue to green.

Implementation with Kubernetes Services:

yaml

Switch by updating the Service selector from version: blue to version: green. Rollback by switching back.

Advantage: Instant rollback. The old version is still running — just switch the selector back.

Disadvantage: Requires double the infrastructure during deployment. With 4 replicas, you need 8 running during the transition.

Canary Deployments

Route a small percentage of traffic to the new version. Monitor error rates and latency. Gradually increase traffic if metrics are healthy. Roll back if they degrade.

Advanced System Design Cohort

We build this end-to-end in the cohort.

Live sessions, real systems, your questions answered in real time. Next cohort starts 2nd July 2026 — 20 seats.

Reserve your spot →

Using Argo Rollouts for automated canary:

yaml

This automatically promotes the canary if the success rate stays above 99%, and rolls back if it drops below.

Database Migrations Without Downtime

The deployment strategy is the easy part. Database migrations are where zero-downtime deployments actually break.

The problem: During a rolling deployment, both v1 and v2 code run simultaneously. If v2 requires a schema change, v1 code might break against the new schema.

The solution: Expand-Contract pattern.

Phase 1: Expand (backward-compatible)

Add new columns/tables without removing or renaming existing ones. Both v1 and v2 work with the expanded schema.

sql

Phase 2: Migrate Data

Backfill the new column with data from existing columns:

sql

Phase 3: Contract (remove old)

After all instances are running v2 and the new column is populated, remove the old column in a future deployment:

sql

Rules:

  • Never rename a column in a single deployment. Add the new column (expand), deploy, migrate data, drop the old column (contract) in the next deployment.
  • Never add a NOT NULL column without a DEFAULT in a single step. Add it as nullable first, backfill, then add the constraint.
  • Never drop a column that running code still reads.

Graceful Shutdown

When a pod is terminated, in-flight requests must complete before the process exits.

python
go

The Kubernetes pod lifecycle for graceful shutdown:

The sleep 10 in the preStop hook is critical. There's a race between the pod being removed from endpoints and the load balancer updating its target list. Without the sleep, the load balancer might still send requests to a pod that's already shutting down.

Feature Flags for Deployment Safety

Decouple deployment from release. Deploy v2 code to production but keep new features behind flags. Enable features gradually after deployment is verified.

python

This lets you:

  • Deploy code changes without user-facing impact
  • Enable features for specific users (internal team, beta users)
  • Instantly disable a broken feature without a rollback deployment
  • Run A/B tests on the same deployment

Feature flags + canary deployments is the safest combination. The canary validates infrastructure stability (new code doesn't crash), and feature flags control feature exposure independently.

Rollback Checklist

When a deployment goes wrong:

  1. Automated rollback — Argo Rollouts or Flagger detect metric degradation and roll back automatically
  2. Manual rollbackkubectl rollout undo deployment/order-service (Kubernetes keeps previous ReplicaSets)
  3. Verify rollback — Check that error rates return to baseline after rollback
  4. Database state — If a migration ran, is the old code compatible with the new schema? (This is why expand-contract matters)
  5. Post-mortem — Why did the canary not catch the issue? Was the analysis template checking the right metrics?

Zero-downtime deployments are a system property, not a deployment strategy choice. The strategy (rolling, blue-green, canary) is one piece. Backward-compatible database migrations, graceful shutdown, readiness probes, and feature flags are equally important. Get all of them right, and deployments become routine. Miss any one, and you're one bad deploy away from an outage.

Deployments CI/CD Kubernetes DevOps

become an engineering leader

Advanced System Design Cohort