Load Balancing: The Traffic Controller

← Part 2: Vertical vs Horizontal Scaling | Part 3 (You are here) | Part 4: Caching →

The Airport Security Problem

Imagine an airport with 10 security checkpoints.

Scenario 1: No organization

Passengers randomly pick lines
Checkpoint 3 has 50 people waiting
Checkpoint 7 has nobody
Average wait time: 45 minutes

Scenario 2: Smart traffic controller

Someone directs passengers to shortest lines
All checkpoints stay busy but not overloaded
Average wait time: 10 minutes

Same number of checkpoints. Same number of passengers. Massively different experience.

That traffic controller? That's a load balancer.

What Is a Load Balancer?

Simple definition: A traffic cop for your servers.

When a user visits your website, the load balancer decides which server handles their request.

Without load balancer:


100 users → All hit Server 1 → Server crashes
             Server 2 and 3 sit idle

With load balancer:


100 users → Load Balancer → 33 users to Server 1
                          → 33 users to Server 2
                          → 34 users to Server 3

All servers share the work. Everyone's happy.

Why You Need Load Balancers

1. Better Performance

Spreading traffic prevents any single server from being overwhelmed.

Real numbers: Without load balancing, one server handles 1,000 requests/second and slows down. With 3 servers behind a load balancer, each handles 333 requests/second—still fast.

2. High Availability

If one server crashes, the load balancer stops sending traffic there.

User experience: They don't even notice. Their request just goes to a healthy server.

3. Easy Scaling

Need more capacity? Just add servers behind the load balancer.

No code changes required. The load balancer automatically includes new servers in rotation.

4. Maintenance Without Downtime

Need to update a server? Tell the load balancer to stop sending traffic there, update it, then bring it back.

Users keep browsing. They're talking to the other servers.

How Load Balancers Decide

Load balancers use algorithms to pick which server gets the next request.

1. Round Robin (Most Common)

How it works: Take turns in order.


Request 1 → Server A
Request 2 → Server B  
Request 3 → Server C
Request 4 → Server A (back to the start)

Pros:

Simple and fair
Works well when all servers are identical

Cons:

Doesn't account for server load
Treats all requests equally (some might be heavy)

Best for: Apps where requests take similar time, servers have same specs.

2. Least Connections

How it works: Send traffic to the server handling fewest active connections.


Server A: 10 active users → Gets next request
Server B: 25 active users
Server C: 15 active users

Pros:

Adapts to actual server load
Better for long-running requests

Cons:

More complex to track
Slightly more overhead

Best for: Apps with variable request times (file uploads, video processing).

3. Weighted Round Robin

How it works: More powerful servers get more traffic.


Server A (16 cores): Gets 2 requests
Server B (8 cores):  Gets 1 request

Pros:

Optimizes mixed hardware
Balances actual capacity

Cons:

Need to configure weights manually
Requires knowing server capabilities

Best for: When you have different server sizes (common during scaling transitions).

4. IP Hash

How it works: User's IP address determines which server they get.

Same user always goes to the same server.

Pros:

Maintains session consistency
Good for stateful apps

Cons:

Uneven distribution if traffic is geographically clustered
Adding/removing servers disrupts routing

Best for: Applications that haven't been made stateless yet (temporary solution).

5. Least Response Time

How it works: Send traffic to the fastest responding server.

Pros:

Optimal performance
Automatically routes around slow servers

Cons:

Requires health checks
More computational overhead

Best for: Performance-critical applications, APIs with strict SLAs.

Layer 4 vs Layer 7 Load Balancing

Don't worry about memorizing "layers"—here's what actually matters:

Layer 4 (Network Layer)

What it sees: IP addresses and ports. Nothing else.

Routing decision: "This request is going to port 443, send it to Server B."

Can't see: URL paths, cookies, HTTP headers.

Pros:

Very fast (minimal processing)
Lower resource usage

Cons:

Basic routing only
Can't route based on content

Example: AWS Network Load Balancer

Best for: High-throughput applications, gaming servers, streaming.

Layer 7 (Application Layer)

What it sees: Everything—URLs, headers, cookies, request content.

Routing decision: "This request is for /api/, send to API servers. This is for /images/, send to media servers."

Pros:

Smart routing (different paths to different servers)
Can handle SSL/TLS termination
Can inspect and modify requests

Cons:

Slightly slower (more processing)
More resource-intensive

Example: NGINX, HAProxy, AWS Application Load Balancer

Best for: Modern web apps, microservices, APIs.

Most companies use Layer 7 because the flexibility is worth the tiny performance cost.

Health Checks: Keeping Everything Running

Load balancers constantly check if servers are alive.

How it works:


Every 10 seconds:
  Load balancer → "Hey Server A, you alive?" → Server A: "Yes!"
  Load balancer → "Hey Server B, you alive?" → [No response]
  Load balancer → Stops sending traffic to Server B

Types of health checks:

Basic ping: Is the server responding at all?

HTTP check: Does it return a 200 OK status?

Deep check: Does the database connection work? Can it process requests?

Real example: If your database dies, your servers might be "up" but can't serve traffic. A deep health check catches this.

Real Story: Shopify's Black Friday

The challenge: Black Friday 2023 saw peak traffic 10x normal levels.

Without load balancing:

All traffic hits same servers
Instant crashes
Millions in lost sales

With load balancing:

Traffic distributed across 1,000+ servers globally
Automatic scaling (added servers as traffic increased)
Health checks removed any failing servers instantly
Result: $9.3 billion in sales, 99.99% uptime

The key: Load balancers made scaling seamless. They added hundreds of servers during the event without any user noticing.

Where Load Balancers Live in Your Architecture

Simple setup:


Users → Load Balancer → [Server 1, Server 2, Server 3] → Database

Production setup:


Users → CDN → Load Balancer → [App Servers] → Load Balancer → [API Servers] → Database

Modern setup:


Users → CDN 
       ↓
Global Load Balancer (routes to nearest region)
       ↓
Regional Load Balancer (distributes within region)
       ↓
[Servers] → [Databases]

Netflix uses the modern approach—their load balancers route you to the closest data center automatically.

Common Mistakes

Mistake 1: No Health Checks

Problem: Load balancer keeps sending traffic to dead servers.

Fix: Configure health checks (every 10-30 seconds).

Mistake 2: Single Load Balancer

Problem: Load balancer becomes single point of failure.

Fix: Use redundant load balancers (most cloud providers do this automatically).

Mistake 3: Using IP Hash When You Don't Need To

Problem: Uneven distribution, harder to scale.

Fix: Make your app stateless, use Round Robin or Least Connections.

Mistake 4: Not Monitoring Load Balancer Itself

Problem: Load balancer maxes out, becomes bottleneck.

Fix: Monitor load balancer metrics (connections/sec, CPU, network).

Your Challenge

Think about an app you use (Spotify, Reddit, YouTube):

How many servers do they probably have?
What load balancing algorithm makes sense for them?
How do they handle when you're watching a video and a server crashes?
Where are the load balancers in their architecture?

This is how senior engineers think about systems.

Key Takeaways

Load balancers distribute traffic across multiple servers
They enable high availability (no single point of failure)
Round Robin is simple and works for most cases
Layer 7 load balancers are more flexible (most common choice)
Health checks automatically remove failed servers
Every major website uses load balancing
Without them, horizontal scaling doesn't work

Next up: Part 4: Caching →

You've got traffic distributed across servers. Now let's make everything 10x faster with caching.

Written by Amika Deshapriya Making system design simple, one story at a time.

Connect: LinkedIn | GitHub | Newsletter