System Design 101 - Part 2: Scalability Deep Dive
2025-11-01 • 18 min read

Vertical vs Horizontal Scaling Explained
← Part 1: Why System Design Matters | Part 2 (You are here) | Part 3: Load Balancing →
The Restaurant Problem
Your restaurant is packed every night. People are waiting 2 hours for a table.
You have two options:
Option 1: Build a bigger restaurant
- Knock down walls
- Add more tables in the same building
- Get a bigger kitchen
- Hire a super-chef who can cook twice as fast
- Keep the original restaurant
- Open 5 smaller branches across the city
- Each location handles its own neighborhood
This is vertical vs horizontal scaling.
Vertical Scaling (Scaling Up)
Simple definition: Making your single server more powerful.
Think of it like: Upgrading from a Honda Civic to a Ferrari. Same driver, same road, just a faster car.
What you're upgrading:
- More CPU cores (4 cores → 16 cores)
- More RAM (8GB → 64GB)
- Faster storage (HDD → SSD)
- Better network cards
When Vertical Scaling Works
Good for:
- Early-stage apps (under 10,000 users)
- Databases (they're hard to split across servers)
- Quick fixes while you plan for horizontal scaling
- Applications not designed for multiple servers
The Problem With Vertical Scaling
You hit a ceiling.
You can't infinitely upgrade one server. Eventually, you reach hardware limits.
Other issues:
- Expensive at the top end: A server with 256GB RAM costs way more than 4 servers with 64GB each
- Single point of failure: If that one super-server crashes, everything goes down
- Downtime for upgrades: You have to shut down to add more RAM
- Limited by physics: There's only so much you can pack into one machine
Horizontal Scaling (Scaling Out)
Simple definition: Adding more servers instead of upgrading one.
Think of it like: Instead of one super-fast delivery driver, you hire 10 regular drivers. More coverage, more reliability.
What you're doing:
- Running your app on 10 servers instead of 1
- Each server handles some of the traffic
- If one fails, the other 9 keep working
When Horizontal Scaling Works
Good for:
- Apps with 10,000+ users
- Global applications (users in different countries)
- Services that need 99.9%+ uptime
- Unpredictable traffic spikes
The Catch: Your App Must Be Stateless
Here's where it gets tricky.
Stateful app: Server remembers things about each user.
- Your login session stored in server memory
- Your shopping cart saved on a specific server
- If that server dies, your data is gone
Stateless app: Server doesn't remember anything.
- User sends authentication token with every request
- Session data stored in a separate database/cache (Redis)
- Shopping cart saved in database, not server memory
- Any server can handle any request
Making Your App Stateless
Bad (Stateful):
User logs in → Server A stores "John is logged in" in memory
User refreshes page → Request goes to Server B → Server B says "Who's John?"
Good (Stateless):
User logs in → Server gives John a token (like a ticket)
John sends token with every request
Any server can verify the token and know it's John
Where to store session data:
- Redis (in-memory cache)
- Database
- JWT tokens (encrypted data sent with each request)
Vertical vs Horizontal: Side by Side
| Aspect | Vertical (Scale Up) | Horizontal (Scale Out) |
|---|---|---|
| Cost | Expensive at high end | Cheaper with many servers |
| Complexity | Simple to implement | Requires architecture changes |
| Limit | Hardware ceiling | No theoretical limit |
| Reliability | Single point of failure | One server fails, others continue |
| Downtime | Required for upgrades | Can upgrade without downtime |
| Best for | Databases, early apps | Web apps, APIs, microservices |
The Hybrid Approach (What Real Companies Do)
Most successful companies use both:
Step 1: Start with vertical scaling
- One powerful server
- Simple architecture
- Low operational overhead
- Split your app across multiple servers
- Keep database on one powerful server (vertical)
- Read replicas (horizontal)
- Sharding (splitting data across servers)
2006 (Launch): One server running everything
2010: Scaled up database vertically (bigger server), scaled out app horizontally (more servers)
2024: Thousands of app servers (horizontal) + sharded databases (horizontal)
They didn't start complex. They evolved.
Real Story: Discord's Scaling Journey
2015 (Launch):
- Built on Elixir (good for real-time apps)
- One database server
- A few app servers
- Worked great for 10,000 users
- Scaled app servers horizontally (easy)
- Database hit limits (vertical scaling maxed out)
- Had to introduce read replicas
- 140 million monthly users
- Had to shard their database (horizontal scaling)
- Built custom caching layer
- Now runs on thousands of servers
When Should You Scale?
Don't scale prematurely.
Scale when you see:
- Server CPU consistently above 70%
- Response times getting slower (from 100ms to 500ms)
- Database queries timing out
- Memory running out
- You're losing users due to performance
- Everything works fine
- You're guessing about future traffic
- You haven't profiled your code
- Your issue is bad code, not traffic
Your Challenge
Look at your current project (or imagine one):
- Is it stateless or stateful?
- Where are you storing user sessions?
- Could it run on multiple servers right now?
- What would break if you added a second server?
Key Takeaways
- Vertical scaling = bigger server (simple but limited)
- Horizontal scaling = more servers (complex but unlimited)
- Stateless apps scale horizontally easily
- Stateful apps need refactoring first
- Most companies use both strategies
- Scale when metrics prove you need it, not before
Next up: Part 3: Load Balancing →
You've got multiple servers now. But how do you distribute traffic between them? That's where load balancers come in.
Written by Amika Deshapriya Making system design simple, one story at a time.
Connect: LinkedIn | GitHub | Newsletter