System Design 101 - Part 3: Load Balancing
2025-11-08 • 5 min read

Load Balancing: The Traffic Controller
← Part 2: Vertical vs Horizontal Scaling | Part 3 (You are here) | Part 4: Caching →
The Airport Security Problem
Imagine an airport with 10 security checkpoints.
Scenario 1: No organization
- Passengers randomly pick lines
- Checkpoint 3 has 50 people waiting
- Checkpoint 7 has nobody
- Average wait time: 45 minutes
- Someone directs passengers to shortest lines
- All checkpoints stay busy but not overloaded
- Average wait time: 10 minutes
That traffic controller? That's a load balancer.
What Is a Load Balancer?
Simple definition: A traffic cop for your servers.
When a user visits your website, the load balancer decides which server handles their request.
Without load balancer:
100 users → All hit Server 1 → Server crashes
Server 2 and 3 sit idle
With load balancer:
100 users → Load Balancer → 33 users to Server 1
→ 33 users to Server 2
→ 34 users to Server 3
All servers share the work. Everyone's happy.
Why You Need Load Balancers
1. Better Performance
Spreading traffic prevents any single server from being overwhelmed.
Real numbers: Without load balancing, one server handles 1,000 requests/second and slows down. With 3 servers behind a load balancer, each handles 333 requests/second—still fast.
2. High Availability
If one server crashes, the load balancer stops sending traffic there.
User experience: They don't even notice. Their request just goes to a healthy server.
3. Easy Scaling
Need more capacity? Just add servers behind the load balancer.
No code changes required. The load balancer automatically includes new servers in rotation.
4. Maintenance Without Downtime
Need to update a server? Tell the load balancer to stop sending traffic there, update it, then bring it back.
Users keep browsing. They're talking to the other servers.
How Load Balancers Decide
Load balancers use algorithms to pick which server gets the next request.
1. Round Robin (Most Common)
How it works: Take turns in order.
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (back to the start)
Pros:
- Simple and fair
- Works well when all servers are identical
- Doesn't account for server load
- Treats all requests equally (some might be heavy)
2. Least Connections
How it works: Send traffic to the server handling fewest active connections.
Server A: 10 active users → Gets next request
Server B: 25 active users
Server C: 15 active users
Pros:
- Adapts to actual server load
- Better for long-running requests
- More complex to track
- Slightly more overhead
3. Weighted Round Robin
How it works: More powerful servers get more traffic.
Server A (16 cores): Gets 2 requests
Server B (8 cores): Gets 1 request
Pros:
- Optimizes mixed hardware
- Balances actual capacity
- Need to configure weights manually
- Requires knowing server capabilities
4. IP Hash
How it works: User's IP address determines which server they get.
Same user always goes to the same server.
Pros:
- Maintains session consistency
- Good for stateful apps
- Uneven distribution if traffic is geographically clustered
- Adding/removing servers disrupts routing
5. Least Response Time
How it works: Send traffic to the fastest responding server.
Pros:
- Optimal performance
- Automatically routes around slow servers
- Requires health checks
- More computational overhead
Layer 4 vs Layer 7 Load Balancing
Don't worry about memorizing "layers"—here's what actually matters:
Layer 4 (Network Layer)
What it sees: IP addresses and ports. Nothing else.
Routing decision: "This request is going to port 443, send it to Server B."
Can't see: URL paths, cookies, HTTP headers.
Pros:
- Very fast (minimal processing)
- Lower resource usage
- Basic routing only
- Can't route based on content
Best for: High-throughput applications, gaming servers, streaming.
Layer 7 (Application Layer)
What it sees: Everything—URLs, headers, cookies, request content.
Routing decision: "This request is for /api/, send to API servers. This is for /images/, send to media servers."
Pros:
- Smart routing (different paths to different servers)
- Can handle SSL/TLS termination
- Can inspect and modify requests
- Slightly slower (more processing)
- More resource-intensive
Best for: Modern web apps, microservices, APIs.
Most companies use Layer 7 because the flexibility is worth the tiny performance cost.
Health Checks: Keeping Everything Running
Load balancers constantly check if servers are alive.
How it works:
Every 10 seconds:
Load balancer → "Hey Server A, you alive?" → Server A: "Yes!"
Load balancer → "Hey Server B, you alive?" → [No response]
Load balancer → Stops sending traffic to Server B
Types of health checks:
Basic ping: Is the server responding at all?
HTTP check: Does it return a 200 OK status?
Deep check: Does the database connection work? Can it process requests?
Real example: If your database dies, your servers might be "up" but can't serve traffic. A deep health check catches this.
Real Story: Shopify's Black Friday
The challenge: Black Friday 2023 saw peak traffic 10x normal levels.
Without load balancing:
- All traffic hits same servers
- Instant crashes
- Millions in lost sales
- Traffic distributed across 1,000+ servers globally
- Automatic scaling (added servers as traffic increased)
- Health checks removed any failing servers instantly
- Result: $9.3 billion in sales, 99.99% uptime
Where Load Balancers Live in Your Architecture
Simple setup:
Users → Load Balancer → [Server 1, Server 2, Server 3] → Database
Production setup:
Users → CDN → Load Balancer → [App Servers] → Load Balancer → [API Servers] → Database
Modern setup:
Users → CDN
↓
Global Load Balancer (routes to nearest region)
↓
Regional Load Balancer (distributes within region)
↓
[Servers] → [Databases]
Netflix uses the modern approach—their load balancers route you to the closest data center automatically.
Common Mistakes
Mistake 1: No Health Checks
Problem: Load balancer keeps sending traffic to dead servers.
Fix: Configure health checks (every 10-30 seconds).
Mistake 2: Single Load Balancer
Problem: Load balancer becomes single point of failure.
Fix: Use redundant load balancers (most cloud providers do this automatically).
Mistake 3: Using IP Hash When You Don't Need To
Problem: Uneven distribution, harder to scale.
Fix: Make your app stateless, use Round Robin or Least Connections.
Mistake 4: Not Monitoring Load Balancer Itself
Problem: Load balancer maxes out, becomes bottleneck.
Fix: Monitor load balancer metrics (connections/sec, CPU, network).
Your Challenge
Think about an app you use (Spotify, Reddit, YouTube):
- How many servers do they probably have?
- What load balancing algorithm makes sense for them?
- How do they handle when you're watching a video and a server crashes?
- Where are the load balancers in their architecture?
Key Takeaways
- Load balancers distribute traffic across multiple servers
- They enable high availability (no single point of failure)
- Round Robin is simple and works for most cases
- Layer 7 load balancers are more flexible (most common choice)
- Health checks automatically remove failed servers
- Every major website uses load balancing
- Without them, horizontal scaling doesn't work
Next up: Part 4: Caching →
You've got traffic distributed across servers. Now let's make everything 10x faster with caching.
Written by Amika Deshapriya Making system design simple, one story at a time.
Connect: LinkedIn | GitHub | Newsletter