Audience: observability engineers and SREs managing distributed tracing in high-throughput production systems.
This article assumes:
Your distributed tracing captures 1% of requests (head-based sampling).
Then an incident happens:
You're flying blind because your sampling threw away the evidence.
What's wrong with "sample 1% of all requests randomly"?
Take 10 seconds.
Answer: (3) is the core problem, which leads to (2).
Random head-based sampling decides at the START of a request whether to trace it. But you don't know if the request will fail or be slow until it COMPLETES.
Imagine a bank with security cameras that randomly record 1% of the day:
Head-based sampling: Decide at midnight which 1% of today to record. Might miss the robbery at 2 PM.
Tail-based sampling: Record everything temporarily, then at end of day, keep only footage with interesting events (robbery, suspicious activity). Delete boring footage.
Tail-based sampling makes sampling decisions AFTER seeing the complete trace, keeping important traces (errors, slow requests) while discarding boring ones.
If you keep 100% of traces temporarily before sampling, where do you store them? How long can you afford to keep them?
Your VP asks: "Why do we sample traces at all? Just store everything."
You explain: "We generate 10 million traces/day. That's 50 TB/day. That's $500K/month storage."
Why can't we just store all traces?
A. Storage costs too much B. Query performance degrades with more data C. Most traces are uninteresting (successful, fast requests) D. All of the above
Answer: D.
But the real question is: Can we keep the important traces and drop the boring ones?
Think of trace sampling as:
The goal: Achieve 99% of debugging value with 1-10% of the data.
Head-based sampling is economically efficient but informationally wasteful. Tail-based sampling inverts the trade-off.
If tail-based sampling is better, why does anyone use head-based sampling?
You decide to implement tail-based sampling. Where does the logic run?
Option 1: Application-side tail sampling (doesn't work)
Option 2: Collector-side tail sampling (standard approach)
Tail-based sampling requires buffering complete traces in memory, which means memory usage scales with request rate and trace duration.
What happens if a trace never completes (missing spans due to network issues)? How long do you wait before making a decision?
You have 10 million traces buffered. Which ones do you keep?
Policy 1: Always keep errors
Policy 2: Always keep slow requests
Policy 3: Probabilistic sampling for healthy traces
Policy 4: Rate limiting per attribute
Policy 5: String attribute matching
Effective tail-based sampling uses composite policies: keep 100% of interesting traces (errors, slow) + small % of baseline traffic.
How do you set the "slow request" threshold dynamically as your system's performance changes over time?
You're buffering traces to make tail-based decisions. But how do you know when a trace is complete?
A trace might have:
How long do you wait?
When should you make a sampling decision?
A. After receiving root span B. After X seconds of no new spans C. After all parent-child references resolved D. All of the above, with fallbacks
Answer: D - you need multiple heuristics.
Heuristic 1: Root span detection
Heuristic 2: Reference resolution
Heuristic 3: Inactivity timeout
Heuristic 4: Expected span count (if available)
Trace completion is probabilistic, not deterministic. Use multiple heuristics with timeouts to force decisions before memory exhaustion.
A long-running async job generates spans over 5 minutes. How do you handle this without waiting 5 minutes to make a sampling decision?
Your system does 100K requests/second. That's 8.6 billion traces/day.
Even with tail-based sampling keeping only 10%, that's 860 million traces to store.
Dimension 1: Collector memory
Dimension 2: Collector throughput
Dimension 3: Trace distribution
Tail-based sampling at scale requires distributed collectors with trace-aware routing and aggressive memory management.
How do you handle a traffic spike that 10x's your trace volume? Can you shed load gracefully?
Your manager asks: "Should we use tail-based sampling for everything?"
Pure tail-based sampling is ideal for observability but expensive at scale. Hybrid approaches balance cost and coverage.
Can you implement tail-based sampling at the application level (SDK) instead of collector, avoiding the buffering problem?
You're the observability lead for a global e-commerce platform.
Requirements:
Constraints:
Write down your design.
1. Storage requirements:
2. Tail-based sampling policies:
3. Collector architecture:
4. Trace completion strategy:
5. Memory management:
6. Cost justification:
Tail-based sampling is a business decision: pay more for storage to save more on incident costs through better debugging.
After implementing tail-based sampling, your storage costs are on target, but query performance degrades (too many traces to search). How do you optimize?
Tail-based sampling design:
Sampling policy checklist:
Trace completion detection:
Memory management:
Scaling considerations:
Cost optimization:
Red flags (redesign needed):