Back to Blog

System Design 101 - Part 3: Load Balancing

2025-11-085 min read

AD
By Amika Deshapriya
System Design 101 • Part 3
System Design 101 - Part 3: Load Balancing

Load Balancing: The Traffic Controller

← Part 2: Vertical vs Horizontal Scaling | Part 3 (You are here) | Part 4: Caching →


The Airport Security Problem

Imagine an airport with 10 security checkpoints.

Scenario 1: No organization

  • Passengers randomly pick lines
  • Checkpoint 3 has 50 people waiting
  • Checkpoint 7 has nobody
  • Average wait time: 45 minutes
Scenario 2: Smart traffic controller
  • Someone directs passengers to shortest lines
  • All checkpoints stay busy but not overloaded
  • Average wait time: 10 minutes
Same number of checkpoints. Same number of passengers. Massively different experience.

That traffic controller? That's a load balancer.


What Is a Load Balancer?

Simple definition: A traffic cop for your servers.

When a user visits your website, the load balancer decides which server handles their request.

Without load balancer:


100 users → All hit Server 1 → Server crashes
             Server 2 and 3 sit idle

With load balancer:


100 users → Load Balancer → 33 users to Server 1
                          → 33 users to Server 2
                          → 34 users to Server 3

All servers share the work. Everyone's happy.


Why You Need Load Balancers

1. Better Performance

Spreading traffic prevents any single server from being overwhelmed.

Real numbers: Without load balancing, one server handles 1,000 requests/second and slows down. With 3 servers behind a load balancer, each handles 333 requests/second—still fast.

2. High Availability

If one server crashes, the load balancer stops sending traffic there.

User experience: They don't even notice. Their request just goes to a healthy server.

3. Easy Scaling

Need more capacity? Just add servers behind the load balancer.

No code changes required. The load balancer automatically includes new servers in rotation.

4. Maintenance Without Downtime

Need to update a server? Tell the load balancer to stop sending traffic there, update it, then bring it back.

Users keep browsing. They're talking to the other servers.


How Load Balancers Decide

Load balancers use algorithms to pick which server gets the next request.

1. Round Robin (Most Common)

How it works: Take turns in order.


Request 1 → Server A
Request 2 → Server B  
Request 3 → Server C
Request 4 → Server A (back to the start)

Pros:

  • Simple and fair
  • Works well when all servers are identical
Cons:
  • Doesn't account for server load
  • Treats all requests equally (some might be heavy)
Best for: Apps where requests take similar time, servers have same specs.

2. Least Connections

How it works: Send traffic to the server handling fewest active connections.


Server A: 10 active users → Gets next request
Server B: 25 active users
Server C: 15 active users

Pros:

  • Adapts to actual server load
  • Better for long-running requests
Cons:
  • More complex to track
  • Slightly more overhead
Best for: Apps with variable request times (file uploads, video processing).

3. Weighted Round Robin

How it works: More powerful servers get more traffic.


Server A (16 cores): Gets 2 requests
Server B (8 cores):  Gets 1 request

Pros:

  • Optimizes mixed hardware
  • Balances actual capacity
Cons:
  • Need to configure weights manually
  • Requires knowing server capabilities
Best for: When you have different server sizes (common during scaling transitions).

4. IP Hash

How it works: User's IP address determines which server they get.

Same user always goes to the same server.

Pros:

  • Maintains session consistency
  • Good for stateful apps
Cons:
  • Uneven distribution if traffic is geographically clustered
  • Adding/removing servers disrupts routing
Best for: Applications that haven't been made stateless yet (temporary solution).

5. Least Response Time

How it works: Send traffic to the fastest responding server.

Pros:

  • Optimal performance
  • Automatically routes around slow servers
Cons:
  • Requires health checks
  • More computational overhead
Best for: Performance-critical applications, APIs with strict SLAs.


Layer 4 vs Layer 7 Load Balancing

Don't worry about memorizing "layers"—here's what actually matters:

Layer 4 (Network Layer)

What it sees: IP addresses and ports. Nothing else.

Routing decision: "This request is going to port 443, send it to Server B."

Can't see: URL paths, cookies, HTTP headers.

Pros:

  • Very fast (minimal processing)
  • Lower resource usage
Cons:
  • Basic routing only
  • Can't route based on content
Example: AWS Network Load Balancer

Best for: High-throughput applications, gaming servers, streaming.

Layer 7 (Application Layer)

What it sees: Everything—URLs, headers, cookies, request content.

Routing decision: "This request is for /api/, send to API servers. This is for /images/, send to media servers."

Pros:

  • Smart routing (different paths to different servers)
  • Can handle SSL/TLS termination
  • Can inspect and modify requests
Cons:
  • Slightly slower (more processing)
  • More resource-intensive
Example: NGINX, HAProxy, AWS Application Load Balancer

Best for: Modern web apps, microservices, APIs.

Most companies use Layer 7 because the flexibility is worth the tiny performance cost.


Health Checks: Keeping Everything Running

Load balancers constantly check if servers are alive.

How it works:


Every 10 seconds:
  Load balancer → "Hey Server A, you alive?" → Server A: "Yes!"
  Load balancer → "Hey Server B, you alive?" → [No response]
  Load balancer → Stops sending traffic to Server B

Types of health checks:

Basic ping: Is the server responding at all?

HTTP check: Does it return a 200 OK status?

Deep check: Does the database connection work? Can it process requests?

Real example: If your database dies, your servers might be "up" but can't serve traffic. A deep health check catches this.


Real Story: Shopify's Black Friday

The challenge: Black Friday 2023 saw peak traffic 10x normal levels.

Without load balancing:

  • All traffic hits same servers
  • Instant crashes
  • Millions in lost sales
With load balancing:
  • Traffic distributed across 1,000+ servers globally
  • Automatic scaling (added servers as traffic increased)
  • Health checks removed any failing servers instantly
  • Result: $9.3 billion in sales, 99.99% uptime
The key: Load balancers made scaling seamless. They added hundreds of servers during the event without any user noticing.


Where Load Balancers Live in Your Architecture

Simple setup:


Users → Load Balancer → [Server 1, Server 2, Server 3] → Database

Production setup:


Users → CDN → Load Balancer → [App Servers] → Load Balancer → [API Servers] → Database

Modern setup:


Users → CDN 
       ↓
Global Load Balancer (routes to nearest region)
       ↓
Regional Load Balancer (distributes within region)
       ↓
[Servers] → [Databases]

Netflix uses the modern approach—their load balancers route you to the closest data center automatically.


Common Mistakes

Mistake 1: No Health Checks

Problem: Load balancer keeps sending traffic to dead servers.

Fix: Configure health checks (every 10-30 seconds).

Mistake 2: Single Load Balancer

Problem: Load balancer becomes single point of failure.

Fix: Use redundant load balancers (most cloud providers do this automatically).

Mistake 3: Using IP Hash When You Don't Need To

Problem: Uneven distribution, harder to scale.

Fix: Make your app stateless, use Round Robin or Least Connections.

Mistake 4: Not Monitoring Load Balancer Itself

Problem: Load balancer maxes out, becomes bottleneck.

Fix: Monitor load balancer metrics (connections/sec, CPU, network).


Your Challenge

Think about an app you use (Spotify, Reddit, YouTube):

  • How many servers do they probably have?
  • What load balancing algorithm makes sense for them?
  • How do they handle when you're watching a video and a server crashes?
  • Where are the load balancers in their architecture?
This is how senior engineers think about systems.

Key Takeaways

  • Load balancers distribute traffic across multiple servers
  • They enable high availability (no single point of failure)
  • Round Robin is simple and works for most cases
  • Layer 7 load balancers are more flexible (most common choice)
  • Health checks automatically remove failed servers
  • Every major website uses load balancing
  • Without them, horizontal scaling doesn't work

Next up: Part 4: Caching →

You've got traffic distributed across servers. Now let's make everything 10x faster with caching.


Written by Amika Deshapriya Making system design simple, one story at a time.

Connect: LinkedIn | GitHub | Newsletter