The Saga Pattern: Managing Distributed Transactions Without 2PC

In a monolith, you wrap related operations in a database transaction — either everything succeeds or everything rolls back. In microservices, each service has its own database. There's no shared transaction boundary. When creating an order requires reserving inventory (service A), charging payment (service B), and scheduling shipment (service C), how do you handle partial failures?

Two-phase commit (2PC) is the textbook answer. It's also the wrong answer for most microservice architectures — it requires all participants to be available, holds locks across services, and doesn't scale. The saga pattern is the practical alternative.

Saga Basics

A saga is a sequence of local transactions. Each local transaction updates its own service's database and publishes an event or message. If a step fails, the saga executes compensating transactions for all previously completed steps — undoing their effects.

Key insight: compensating transactions don't "undo" in the database ROLLBACK sense. They're new transactions that semantically reverse the effect. An inventory reservation is compensated by releasing the reservation. A payment charge is compensated by a refund.

Choreography: Event-Driven Sagas

Each service listens for events and decides what to do next. No central coordinator. Services communicate through events.

python

Choreography Trade-offs

Pros:

Simple to implement for small sagas (3-4 steps)
Loose coupling — services only know about events, not each other
No single point of failure

Cons:

Hard to understand the overall flow — logic is scattered across services
Difficult to add new steps (must update event listeners in multiple services)
No central place to see saga status or handle timeouts
Cyclic event dependencies can create infinite loops

Orchestration: Centralized Saga Coordinator

A saga orchestrator (SEC) manages the flow. It sends commands to services and listens for their responses, deciding what to do next based on a state machine.

python

Orchestration Trade-offs

Pros:

Clear overview of the saga flow in one place
Easy to add, remove, or reorder steps
Centralized error handling and compensation logic
Easy to query saga status

Cons:

The orchestrator is a single point of failure (mitigate with HA deployment)
Risk of the orchestrator becoming a "god service" that knows too much about other services
Tighter coupling between the orchestrator and participant services

Designing Compensating Transactions

Not every action is easily compensatable. Consider:

Action	Compensating Transaction	Complexity
Reserve inventory	Release reservation	Easy
Debit account	Credit account (refund)	Easy
Send email	Send correction/cancellation email	Possible but imperfect
Charge credit card	Refund (but transaction fees aren't refunded)	Lossy
Ship physical package	Initiate return process	Hard, slow
Delete data	Cannot restore if not backed up	Impossible

Pivot vs Retriable transactions: Some saga steps are pivot transactions — once they succeed, the saga must complete. Steps before the pivot are compensatable. Steps after the pivot are retriable (they must eventually succeed, with retries).

Timeout Handling

Sagas can get stuck. A service might be down, a message might be lost. You need timeout detection.

python

Choreography vs Orchestration: Decision Guide

For most production systems with more than 3 steps, orchestration is the better choice. The centralized view of saga state and explicit compensation logic outweighs the coupling trade-off. Use choreography for simple, well-understood flows where the team wants maximum decoupling.

2PC vs Saga: When Each Applies

Criteria	2PC	Saga
Consistency	Strong (ACID)	Eventual
Availability	Low (all participants must be up)	High (tolerates partial failure)
Latency	High (lock held during prepare phase)	Lower (no distributed locks)
Scalability	Poor (coordinator bottleneck)	Good
Use case	Databases within one datacenter	Microservices across network

Use 2PC when you need strong consistency between two databases in the same datacenter (e.g., writing to PostgreSQL and a message queue). Use sagas for everything else in a microservice architecture.

The saga pattern accepts that distributed transactions can't provide the same guarantees as local transactions. Instead of pretending otherwise, it designs for partial failure with explicit compensation logic. The result is a system that's more resilient, more scalable, and honest about its consistency guarantees.

The Saga Pattern: Managing Distributed Transactions Without 2PC

The Saga Pattern: Managing Distributed Transactions Without 2PC

Saga Basics

Choreography: Event-Driven Sagas

Choreography Trade-offs

Orchestration: Centralized Saga Coordinator

We build this end-to-end in the cohort.

Orchestration Trade-offs

Designing Compensating Transactions

Timeout Handling

Choreography vs Orchestration: Decision Guide

2PC vs Saga: When Each Applies

More in Architecture

The Strangler Fig Pattern: Migrating Legacy Systems Incrementally

Designing Data Pipeline Architecture for Real-Time Analytics

become an engineering leader