CHALLENGE: You’re migrating a living, breathing monolith… without stopping the business
It’s 11:47 AM. Your checkout service is melting. Marketing just launched a “flash sale.” Support is screaming. The CEO is in a meeting promising “microservices by Q3.”
You have a monolith that:
The business constraint: you can’t pause feature work and you can’t take downtime.
You’re asked to “rewrite it as microservices.”
Interactive question (pause and think):
If you start a full rewrite, what’s the most likely outcome?
Pause. Think.
Reveal: B is the common outcome.
Full rewrites tend to fail because the old system keeps changing, the new system lags behind, and you end up with two incomplete truths.
Key insight:
The Strangler Fig Pattern is a strategy for incremental replacement: you build new capabilities around the edges of the old system, route traffic to the new pieces, and gradually “strangle” the old implementation until it can be removed.
Scenario: You inherit a 15-year-old “Order Management System” (OMS). It’s stable-ish, but every change is risky. You want to modernize it into services.
Interactive question (pause and think):
What does the Strangler Fig Pattern actually replace first?
Pause and think.
Reveal: C.
Analogy: The strangler fig tree A strangler fig starts by growing around an existing tree. Over time, it forms a lattice that takes over light and nutrients. Eventually, the original tree dies and the fig remains.
In software:
Real-world parallel: A delivery service swapping warehouses Imagine a delivery company with one old warehouse (slow, cramped). You build a new warehouse for “high-volume items” first. You route those items to the new warehouse, while everything else still ships from the old one. Over time, more categories move.
Mental model The pattern is not “rewrite.” It is:
Key insight box:
Strangler Fig = controlled traffic migration at the system boundary + incremental functional replacement.
Challenge questions
Scenario: You plan to split the monolith into services. But once you do, you introduce:
Interactive question (pause and think):
Which is the biggest new class of problems you introduce when strangling into microservices?
Reveal: B.
Analogy: Coffee shop with one barista vs. a team One barista (monolith) makes every drink. If they’re slow, the whole line slows, but coordination is simple.
A team (microservices) splits tasks: espresso, milk, checkout. Now:
Mental model: The “failure surface area” curve As you adopt Strangler Fig:
Key insight box:
In distributed environments, Strangler Fig is a traffic-shaping + failure-management exercise as much as it is a refactoring strategy.
Challenge questions
Scenario: Your monolith serves:
You need a “choke point” to redirect traffic.
Decision game: Which statement is true?
Pause and think.
Reveal: 2 is generally true.
Explanation The most common strangling boundary is:
If you can intercept traffic at a single point, you can gradually shift load.
Common misconception
“Strangler Fig means you must rewrite the database last.”
Reality: database migration is often the hardest part. Sometimes you migrate data early for a slice, sometimes late; the pattern is flexible but data is the constraint.
Key insight box:
Pick a boundary where you can route, observe, and roll back.
Challenge questions
Scenario: You want to move “Order History” out of the monolith first.
Phase 1: Intercept You add a routing layer that can decide:
/orders/{id}/orders/{id}/history[IMAGE: Architecture diagram showing clients -> gateway -> (monolith OR new service) with a routing rule for a subset of endpoints. Include observability sidecars/metrics/tracing.]
Phase 2: Implement a vertical slice A vertical slice includes:
Not just “a microservice skeleton.”
Phase 3: Migrate traffic and retire You gradually increase routing percentage or route by feature.
Interactive question (pause and think):
Which traffic migration strategy is safest for distributed systems?
Reveal: B.
Analogy: Restaurant menu rollout A restaurant introduces a new kitchen station for “vegan dishes.” They don’t switch the whole menu; they route a subset of orders to the new station, then expand.
Key insight box:
Strangler Fig works best with controlled cohorts and fast rollback.
Challenge questions
Scenario: You route /orders/{id}/history to a new service. It calls the monolith for “order metadata” because you didn’t migrate that yet.
Now you have a distributed call chain: Client -> Gateway -> HistorySvc -> Monolith
Failure scenario A: Monolith is slow HistorySvc times out waiting for monolith.
Pause and think:
Should HistorySvc retry?
Reveal: C.
Explanation: Retries amplify load (retry storms). Use:
Failure scenario B: Split-brain business logic You accidentally implement “refund eligibility” differently in the new service.
Common misconception
“If endpoints are different, business rules can drift safely.”
Reality: customers experience workflows, not endpoints. Drifts create inconsistent outcomes.
Failure scenario C: Inconsistent data reads Monolith reads from DB A. New service reads from DB B (or a replica). You route partial traffic.
This can create:
Failure scenario D: Observability blind spots You can’t tell whether errors come from the gateway, new service, or monolith.
[IMAGE: Trace waterfall view showing request across gateway, service, monolith with latency and error annotations.]
Key insight box:
Strangling creates mixed call graphs. Treat observability and failure handling as first-class features.
Challenge questions
Scenario: You need to move /payments next. This is high risk.
Strategy 1: Path-based routing Simple rules:
/v2/payments/** -> PaymentSvcPros: easy reasoning. Cons: clients must change paths, or gateway must rewrite.
Strategy 2: Header/flag-based routing
X-Use-New-Payments: true -> newPros: controlled experiments. Cons: clients (or gateway) must inject flags.
Strategy 3: Cohort/tenant routing
Pros: stable user experience per tenant. Cons: operational complexity if tenants vary in behavior.
Strategy 4: Percentage-based canary
Pros: catches unknown unknowns. Cons: harder to debug user reports (“sometimes it fails”).
Strategy 5: Shadow traffic (dark launch) Send a copy of requests to new service, but don’t use response.
Pros: validate performance/correctness. Cons: must avoid side effects.
Interactive matching exercise:
Match the routing strategy to the best use case:
| Strategy | Use case |
|---|---|
| Path-based | (A) Validate correctness without impacting users |
| Cohort-based | (B) High-risk migration with easy rollback |
| Shadow traffic | (C) New API version adoption |
| Percentage canary | (D) Avoid “sometimes” behavior for a tenant |
Pause and think.
Reveal:
Key insight box:
Choose routing based on debuggability and blast radius, not ideology.
Challenge questions
Scenario: You moved “Order History API” but the monolith still owns the orders table. Your new service needs history data and must not corrupt the monolith.
The core tension Strangler Fig is easy at the API layer; it’s hard at the data ownership layer.
Three data migration patterns (often combined)
Pros:
Cons:
Pros:
Cons:
Pros:
Cons:
Common misconception
“Dual-write is fine if we use retries.”
Reality: retries don’t fix split-brain writes. You need idempotency keys, transactional outbox, or a single writer.
[IMAGE: Diagram showing monolith DB, outbox table, CDC pipeline (Debezium/Kafka), new service DB, and consumers.]
Interactive question (pause and think)
If you can only pick one principle to keep migrations sane, which is best?
Reveal: B.
Key insight box:
Strangler Fig succeeds when you establish clear data ownership and use reliable propagation (outbox/CDC) during transition.
Challenge questions
Scenario: You have these candidates to extract:
Decision game: Which is the best first slice for Strangler Fig in a distributed environment?
Pick one and justify:
Pause and think.
Reveal (typical answer): B.
Why “read-mostly” slices are common first moves
But: if your biggest pain is auth, you might start there—Strangler Fig is context-dependent.
Key insight box:
Start with a slice that has low coupling, low write complexity, and high learning value.
Challenge questions
Scenario: You route 10% of /orders/{id}/history to HistorySvc. Customers report “missing items,” but only sometimes.
What you need before serious strangling
Interactive question (pause and think)
Which metric most quickly reveals that the new service is harming users?
Reveal: B.
Mental model: “Split traffic = split truth” You must answer:
Technique: Differential comparison For read endpoints, you can:
Key insight box:
Without strong observability, Strangler Fig becomes guess-and-pray.
Challenge questions
Scenario: You extracted PaymentSvc. Checkout now calls:
One request becomes a saga.
Pause and think
Which statement is true?
Reveal: 2.
Analogy: Restaurant order with multiple stations If dessert is out of stock after you already cooked the main dish, you don’t rewind time; you offer a substitute or refund. That’s compensation.
Distributed-systems mechanics
[IMAGE: Saga diagram showing steps with success path and compensation path.]
Common misconception
“Strangler Fig is just routing; transactions don’t change.”
Reality: as soon as you split writes, you must redesign workflow consistency.
Key insight box:
Strangling write paths forces you to choose: orchestration, choreography, or keep the write in the monolith longer.
Challenge questions
Scenario: After months, 80% of traffic is handled by new services. The monolith still contains:
Decision game: Which retirement plan is most realistic?
Reveal: C.
The “last 20%” problem The last pieces are often:
Retirement checklist
Key insight box:
Retiring the monolith is a product + ops project, not just engineering.
Challenge questions
Scenario: You’re in a design review. You hear these statements.
Misconception 1: “Strangler Fig = microservices” Strangler Fig is a migration approach. You can strangle into:
Misconception 2: “We’ll just add an API gateway and we’re done” The gateway enables routing, but you still must handle:
Misconception 3: “We can ignore the monolith once traffic is low” Low traffic endpoints can still be high criticality (admin, refunds, compliance).
Misconception 4: “Event-driven solves coupling automatically” Events reduce synchronous coupling but introduce:
Key insight box:
Strangler Fig is a socio-technical pattern: it changes org coordination, on-call, and deployment practices.
Challenge questions
Scenario: Leadership asks: “Is Strangler Fig always the best migration approach?”
When Strangler Fig is a strong fit
When Strangler Fig is risky or expensive
Comparison table: Strangler Fig vs Big Bang Rewrite vs Modular Monolith Refactor
| Approach | Downtime risk | Parallel run complexity | Learning curve | Data migration difficulty | Typical failure mode |
|---|---|---|---|---|---|
| Strangler Fig | Low | Medium-High | Medium | High | Never finishing; hybrid spaghetti |
| Big bang rewrite | High | High | High | High | Rewrite abandoned; feature lag |
| Modular monolith refactor | Low-Medium | Low | Medium | Medium | Doesn’t address scaling/org needs |
Interactive question (pause and think):
Which approach minimizes distributed-systems complexity?
Reveal: C often minimizes distributed complexity, though it may not meet all goals.
Key insight box:
Strangler Fig trades migration safety for temporary architectural complexity.
Challenge questions
Scenario: You want to know how this looks in practice.
Pattern A: Gateway + BFF + extracted read services
Pattern B: Event outbox + CDC to build new domain stores
Pattern C: “Carve out” a high-change domain
Pattern D: Strangle by message bus (not HTTP)
[IMAGE: Four mini-architectures showing patterns A-D.]
Key insight box:
Many successful stranglings start with reads + events before moving write ownership.
Challenge questions
Scenario: You are the tech lead. You must propose a 90-day strangler plan.
Step 1: Pick a boundary Choose one:
Write down: What is your boundary and why?
Step 2: Pick your first slice Constraints:
Write down: What slice and what is the success metric?
Step 3: Choose routing strategy Pick:
Write down: What is the rollback mechanism?
Step 4: Data plan Pick:
Write down: Who is the single writer during transition?
Step 5: Failure plan Define:
Write down: What happens when the new service is down?
Key insight box:
A strangler plan is not a diagram—it’s a traffic + data + failure contract.
Challenge questions
Scenario: You’re about to route 30% of checkout traffic to a new PaymentSvc.
Answer these in order. Pause and think before revealing.
Q1) Where is your choke point?
Reveal: B.
Q2) What is your rollback time?
Reveal: B.
Q3) Who writes “payment authorization” truth during migration?
Reveal: B.
Q4) What’s your plan for partial failure?
Reveal: B.
Q5) How do you know the new service is correct?
Reveal: B.
Final key insight box:
The Strangler Fig Pattern is successful when you treat migration as traffic engineering + data ownership + failure design, not just refactoring.