Courses 0%
39
Foundations of System Design · Chapter 39 of 42

Load Balancing

Akhil
Akhil Sharma
20 min

Load Balancing

Distributing incoming traffic across multiple servers so no single machine becomes a bottleneck — the gateway to horizontal scaling.

Load Balancing Basics - The Traffic Director

🎯 Challenge 4: The Restaurant Host Problem

Scenario: You own a restaurant with 5 identical dining rooms. Customers arrive at the entrance.

Without a host:

Entrance │

Room 1: ████████████ (packed, 30 min wait)

Room 2: ████ (some people)

Room 3: ████████████ (packed)

Room 4: ██ (almost empty!)

Room 5: ████████████ (packed)

Problem: Customers just pick random rooms! Result: Long waits, unhappy customers, empty tables

With a smart host:

Entrance

Host (sees all rooms, directs customers smartly)

├─→ Room 1: ██████ (balanced)

├─→ Room 2: ██████ (balanced)

├─→ Room 3: ██████ (balanced)

├─→ Room 4: ██████ (balanced)

└─→ Room 5: ██████ (balanced)

Result: No waits, happy customers, efficient!

This is exactly what a Load Balancer does!


🏗️ What is Load Balancing?

Definition: A load balancer is like a smart traffic director that distributes incoming requests across multiple servers.

img1

All servers: Happy and balanced!

Key insight: Load balancers prevent any single server from being overwhelmed!


🎮 Interactive Exercise: Load Balancing Algorithms

Scenario: You have 3 servers. 9 requests arrive. How do you distribute them?

Let's explore different strategies:

Algorithm 1: Round Robin 🔄

How it works: "Take turns, like dealing cards"

img2

Perfect equality!

Pros: Simple, fair, easy to implement Cons: Doesn't consider server load or speed

Real-world example: Like handing out homework to students in seat order


Algorithm 2: Least Connections 📊

How it works: "Send to whoever is least busy right now"

Starting state:

Server 1: 2 active connections

Server 2: 5 active connections

Server 3: 1 active connection

img3

New request arrives:

→ Check all servers

→ Server 3 has fewest connections (1)

→ Send request to Server 3!

img4

Pros: Accounts for different request durations

Cons: Slightly more complex, needs to track connections

Real-world example: Like a grocery store checkout - you join the shortest line!


Algorithm 3: Weighted Distribution ⚖️

How it works: "Better servers get more work"

Server setup:

Server 1: Power rating 5 (newest, most powerful)

Server 2: Power rating 3 (medium)

Server 3: Power rating 2 (older, slower)

Total power: 10

Distribution ratio: 5:3:2

For 100 requests:

Server 1 gets: 50 requests (50%)

Server 2 gets: 30 requests (30%)

Server 3 gets: 20 requests (20%)

img5

Pros: Utilizes hardware differences efficiently

Cons: Requires knowing server capabilities

Real-world example: Like a restaurant assigning more tables to experienced waiters


Algorithm 4: IP Hash 🔐

How it works: "Same client always goes to same server"

How hashing works:

User IP: 192.168.1.100

→ Hash function: hash(192.168.1.100) = 347

→ 347 % 3 = 2 (modulo number of servers)

→ Always routes to Server 2!

Same user's next request:

→ hash(192.168.1.100) = 347 (same!)

→ 347 % 3 = 2

→ Server 2 again!

Different user IP: 192.168.1.101

→ hash(192.168.1.101) = 892

→ 892 % 3 = 1

→ Routes to Server 1

img6
)

Pros:

  • Session persistence (user's data cached on one server)
  • Predictable routing

Cons:

  • Uneven distribution if many users from same IP range
  • Server failure requires rehashing

Real-world example: Like having a regular doctor - you always see the same one who knows your history


💡 Types of Load Balancers: Layer 4 vs Layer 7

🔌 LAYER 4 (Network Load Balancer)

═══════════════════════════════════════

Works at: TCP/UDP level

Sees: IP addresses, ports

Fast: Very (just forwards packets)

Smart: Not very

Example decision:

img7

Use when: Raw speed needed, simple TCP routing

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🌐 LAYER 7 (Application Load Balancer)

═══════════════════════════════════════

Works at: HTTP/HTTPS level

Sees: URLs, headers, cookies, content

Fast: Slower (must read HTTP)

Smart: Very!

Example decision: "This request is for /api/users" → Route to API servers

"This request is for /images/cat.jpg" → Route to image servers

Internet │ ▼ ┌─────────┐ │ Layer 7 │ "GET /api/users" │ LB │ → Route to API cluster └─────────┘ │ "GET /static/logo.png" │ → Route to CDN servers │ ┌────┴─────┐ ▼ ▼

[API Servers] [Static Servers]

Use when: Need smart routing by content

Real-world comparison:

  • Layer 4: Like a highway toll booth - just directs cars, doesn't care what's inside

  • Layer 7: Like a smart receptionist - reads your request and directs you to the right department

🚨 Common Misconception: "Load Balancers Never Fail"

You might think: "Great! Load balancer solves everything!"

The Reality: Load balancers themselves can fail or become bottlenecks!

The Solution: Multiple Load Balancers!

❌ SINGLE LOAD BALANCER (Risky!)

═════════════════════════════════

img8

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ REDUNDANT LOAD BALANCERS (Safe!)

═════════════════════════════════

img9

Safety: If LB 1 fails, LB 2 continues!


🎪 Real-World Load Balancing Architecture

Let's see how a real company like Netflix might structure their load balancing:

🌍 GLOBAL ARCHITECTURE

═══════════════════════════════════════════

img10

Each zone has:

├─ Layer 4 Load Balancer (Network LB) ← Third layer: TCP routing

├─ 50+ Web Servers

├─ 20+ API Servers

└─ 10+ Database Read Replicas

Why multiple layers?

  1. DNS Layer: Routes by geography (latency optimization)

  2. Application Layer (L7): Routes by URL path, HTTP headers

  3. Network Layer (L4): Raw TCP connection distribution


🔍 Investigation: What Happens When a Server Dies?

Scenario: You have 4 servers behind a load balancer. Server 3 crashes.

Without Health Checks:

Load Balancer: "Round robin time!"

Request 1 → Server 1 ✅ (works)

Request 2 → Server 2 ✅ (works)

Request 3 → Server 3 ❌ (DEAD! User sees error)

Request 4 → Server 4 ✅ (works)

Request 5 → Server 1 ✅ (works)

Request 6 → Server 2 ✅ (works)

Request 7 → Server 3 ❌ (STILL DEAD! Another error)

Result: 25% of requests fail! 😡

With Health Checks (Smart!):

Health Check Configuration:

  • Check every 10 seconds

  • Send HTTP GET to /health

  • Expect 200 OK response

  • If fails 3 times → Mark unhealthy

Timeline: 10:00:00 - Server 3 crashes

10:00:10 - Health check fails (1/3)

10:00:20 - Health check fails (2/3)

10:00:30 - Health check fails (3/3) → REMOVED!

Load Balancer: "Server 3 is out! Skip it!"

Request 1 → Server 1 ✅

Request 2 → Server 2 ✅

Request 3 → Server 4 ✅ (skipped dead Server 3!)

Request 4 → Server 1 ✅

Request 5 → Server 2 ✅

Result: 0% failures! 🎉

10:05:00 - Server 3 comes back online 10:05:10 - Health check succeeds (1/3) 10:05:20 - Health check succeeds (2/3) 10:05:30 - Health check succeeds (3/3) → ADDED BACK!

Load Balancer: "Welcome back, Server 3!"

Key insight: Health checks are CRITICAL for reliability!



🎯 Decision Game: Choose Your Algorithm

For each scenario, pick the best load balancing algorithm:

Scenario A: Simple website, all servers identical

Scenario B: Video encoding - jobs take 5 seconds to 5 minutes

Scenario C: Shopping cart - need same user to hit same server

Scenario D: 1 new powerful server + 2 older slower servers

Scenario E: API with completely independent requests

Think carefully...


ANSWERS:

Scenario A: ROUND ROBIN ✓

Why: Servers identical, simple is best

Scenario B: LEAST CONNECTIONS ✓

Why: Job duration varies widely, need dynamic balancing

Scenario C: IP HASH ✓

Why: Session persistence required for cart data

Scenario D: WEIGHTED ✓

Why: Different server capabilities

Config: New server weight=5, old servers weight=2 each

Scenario E: ROUND ROBIN or RANDOM ✓

Why: Stateless, any algorithm works fine


🎪 The Complete Picture: Load Balancer Features

MODERN LOAD BALANCER CAPABILITIES

═══════════════════════════════════════

✅ Traffic Distribution

├─ Round Robin

├─ Least Connections

├─ Weighted

└─ IP Hash

✅ Health Monitoring

├─ HTTP health checks

├─ TCP health checks

├─ Custom health endpoints

└─ Automatic server removal/re-addition

✅ SSL/TLS Termination

├─ Handle HTTPS encryption

├─ Decrypt at load balancer

└─ Send plain HTTP to backend (reduces server CPU load)

✅ Sticky Sessions

├─ Cookie-based routing

└─ Keep user on same server

✅ Rate Limiting

├─ Protect from DDoS

└─ Throttle aggressive clients

✅ Geographic Routing

├─ Route EU users to EU servers

└─ Minimize latency

✅ Content-Based Routing

├─ /api/* → API servers*

├─ /images/* → Image servers*

└─ /admin/* → Admin servers*

✅ Metrics & Logging

├─ Request counts

├─ Response times

└─ Error rates


🔑 Key Takeaways: Load Balancing

📝 REMEMBER:

═══════════════════════════════════════

1️⃣ Load balancers distribute traffic across servers

2️⃣ Multiple algorithms for different needs:

• Round Robin → Simple rotation

• Least Connections → Dynamic balancing

• Weighted → Different server capabilities

• IP Hash → Session persistence

3️⃣ Health checks prevent routing to dead servers

4️⃣ Layer 4 = Fast & simple (TCP/IP) Layer 7 = Smart & content-aware (HTTP)

5️⃣ Load balancers themselves need redundancy!

6️⃣ Essential for horizontal scaling

Load Balancing Algorithms Comparison

AlgorithmHow it worksBest forDrawback
Round RobinRotate through servers sequentiallyEqual servers, stateless requestsIgnores server load
Least ConnectionsRoute to server with fewest active connectionsVarying request durationsSlightly more overhead
Weighted Round RobinRotate with proportional weight per serverMixed hardware capabilitiesMust configure weights manually
IP HashHash client IP to pick serverSession stickiness without cookiesUneven if IP distribution is skewed
Least Response TimeRoute to fastest responding serverLatency-sensitive applicationsRequires constant health monitoring
RandomPick a server randomlySimple systems with equal serversNo intelligence at all

Layer 4 vs Layer 7

FeatureLayer 4 (Transport)Layer 7 (Application)
Operates onTCP/UDP packetsHTTP requests
SpeedFaster (no content inspection)Slower (reads headers/body)
Routing decisionsIP + port onlyURL path, headers, cookies, body
Use caseRaw throughput, non-HTTPContent-based routing, API gateways
ExampleAWS NLBAWS ALB, Nginx, HAProxy
Chapter complete!

Course Complete!

You've finished all 42 chapters of

Introduction to System Design

Browse courses
Up next Single Point of Failure
Continue