Vertical vs Horizontal Scaling Explained

← Part 1: Why System Design Matters | Part 2 (You are here) | Part 3: Load Balancing →

The Restaurant Problem

Your restaurant is packed every night. People are waiting 2 hours for a table.

You have two options:

Option 1: Build a bigger restaurant

Knock down walls
Add more tables in the same building
Get a bigger kitchen
Hire a super-chef who can cook twice as fast

Option 2: Open more locations

Keep the original restaurant
Open 5 smaller branches across the city
Each location handles its own neighborhood

Both solve the problem. But they work completely differently.

This is vertical vs horizontal scaling.

Vertical Scaling (Scaling Up)

Simple definition: Making your single server more powerful.

Think of it like: Upgrading from a Honda Civic to a Ferrari. Same driver, same road, just a faster car.

What you're upgrading:

More CPU cores (4 cores → 16 cores)
More RAM (8GB → 64GB)
Faster storage (HDD → SSD)
Better network cards

Real example: Your app runs on one server with 4GB RAM. It's slowing down. You upgrade to 16GB RAM. Problem solved—for now.

When Vertical Scaling Works

Good for:

Early-stage apps (under 10,000 users)
Databases (they're hard to split across servers)
Quick fixes while you plan for horizontal scaling
Applications not designed for multiple servers

Stack Overflow's approach: For years, they handled millions of users on just a handful of very powerful servers. They scaled vertically first because their code wasn't built for horizontal scaling.

The Problem With Vertical Scaling

You hit a ceiling.

You can't infinitely upgrade one server. Eventually, you reach hardware limits.

Other issues:

Expensive at the top end: A server with 256GB RAM costs way more than 4 servers with 64GB each
Single point of failure: If that one super-server crashes, everything goes down
Downtime for upgrades: You have to shut down to add more RAM
Limited by physics: There's only so much you can pack into one machine

When you need more: This is when you switch to horizontal scaling.

Horizontal Scaling (Scaling Out)

Simple definition: Adding more servers instead of upgrading one.

Think of it like: Instead of one super-fast delivery driver, you hire 10 regular drivers. More coverage, more reliability.

What you're doing:

Running your app on 10 servers instead of 1
Each server handles some of the traffic
If one fails, the other 9 keep working

Real example: Netflix runs thousands of servers across the globe. If 100 servers crash, you don't even notice—the remaining thousands keep streaming.

When Horizontal Scaling Works

Good for:

Apps with 10,000+ users
Global applications (users in different countries)
Services that need 99.9%+ uptime
Unpredictable traffic spikes

Netflix's approach: They don't have one giant server. They have thousands of smaller servers. During peak hours (8 PM when everyone's watching), they automatically add more servers. At 3 AM, they scale back down.

The Catch: Your App Must Be Stateless

Here's where it gets tricky.

Stateful app: Server remembers things about each user.

Your login session stored in server memory
Your shopping cart saved on a specific server
If that server dies, your data is gone

Problem: With 10 servers, user 1 might connect to server A, then their next request goes to server B—which doesn't know who they are.

Stateless app: Server doesn't remember anything.

User sends authentication token with every request
Session data stored in a separate database/cache (Redis)
Shopping cart saved in database, not server memory
Any server can handle any request

This is the key to horizontal scaling.

Making Your App Stateless

Bad (Stateful):


User logs in → Server A stores "John is logged in" in memory
User refreshes page → Request goes to Server B → Server B says "Who's John?"

Good (Stateless):


User logs in → Server gives John a token (like a ticket)
John sends token with every request
Any server can verify the token and know it's John

Where to store session data:

Redis (in-memory cache)
Database
JWT tokens (encrypted data sent with each request)

Vertical vs Horizontal: Side by Side

Aspect	Vertical (Scale Up)	Horizontal (Scale Out)
Cost	Expensive at high end	Cheaper with many servers
Complexity	Simple to implement	Requires architecture changes
Limit	Hardware ceiling	No theoretical limit
Reliability	Single point of failure	One server fails, others continue
Downtime	Required for upgrades	Can upgrade without downtime
Best for	Databases, early apps	Web apps, APIs, microservices

The Hybrid Approach (What Real Companies Do)

Most successful companies use both:

Step 1: Start with vertical scaling

One powerful server
Simple architecture
Low operational overhead

Step 2: Add horizontal scaling for app servers

Split your app across multiple servers
Keep database on one powerful server (vertical)

Step 3: Eventually scale database horizontally

Read replicas (horizontal)
Sharding (splitting data across servers)

Real example: Shopify

2006 (Launch): One server running everything

2010: Scaled up database vertically (bigger server), scaled out app horizontally (more servers)

2024: Thousands of app servers (horizontal) + sharded databases (horizontal)

They didn't start complex. They evolved.

Real Story: Discord's Scaling Journey

2015 (Launch):

Built on Elixir (good for real-time apps)
One database server
A few app servers
Worked great for 10,000 users

2017 (Growth):

Scaled app servers horizontally (easy)
Database hit limits (vertical scaling maxed out)
Had to introduce read replicas

2020 (Massive growth during pandemic):

140 million monthly users
Had to shard their database (horizontal scaling)
Built custom caching layer
Now runs on thousands of servers

The lesson: They scaled up when possible, scaled out when necessary.

When Should You Scale?

Don't scale prematurely.

Scale when you see:

Server CPU consistently above 70%
Response times getting slower (from 100ms to 500ms)
Database queries timing out
Memory running out
You're losing users due to performance

Don't scale when:

Everything works fine
You're guessing about future traffic
You haven't profiled your code
Your issue is bad code, not traffic

Golden rule: Fix inefficient code before adding servers. A slow database query doesn't get faster with more servers.

Your Challenge

Look at your current project (or imagine one):

Is it stateless or stateful?
Where are you storing user sessions?
Could it run on multiple servers right now?
What would break if you added a second server?

Write down the answers. This thinking is what separates junior from senior engineers.

Key Takeaways

Vertical scaling = bigger server (simple but limited)
Horizontal scaling = more servers (complex but unlimited)
Stateless apps scale horizontally easily
Stateful apps need refactoring first
Most companies use both strategies
Scale when metrics prove you need it, not before

Next up: Part 3: Load Balancing →

You've got multiple servers now. But how do you distribute traffic between them? That's where load balancers come in.

Written by Amika Deshapriya Making system design simple, one story at a time.

Connect: LinkedIn | GitHub | Newsletter