Back to Blog

System Design 101 - Part 2: Scalability Deep Dive

2025-11-0118 min read

AD
By Amika Deshapriya
System Design 101 • Part 2
System Design 101 - Part 2: Scalability Deep Dive

Vertical vs Horizontal Scaling Explained

← Part 1: Why System Design Matters | Part 2 (You are here) | Part 3: Load Balancing →


The Restaurant Problem

Your restaurant is packed every night. People are waiting 2 hours for a table.

You have two options:

Option 1: Build a bigger restaurant

  • Knock down walls
  • Add more tables in the same building
  • Get a bigger kitchen
  • Hire a super-chef who can cook twice as fast
Option 2: Open more locations
  • Keep the original restaurant
  • Open 5 smaller branches across the city
  • Each location handles its own neighborhood
Both solve the problem. But they work completely differently.

This is vertical vs horizontal scaling.


Vertical Scaling (Scaling Up)

Simple definition: Making your single server more powerful.

Think of it like: Upgrading from a Honda Civic to a Ferrari. Same driver, same road, just a faster car.

What you're upgrading:

  • More CPU cores (4 cores → 16 cores)
  • More RAM (8GB → 64GB)
  • Faster storage (HDD → SSD)
  • Better network cards
Real example: Your app runs on one server with 4GB RAM. It's slowing down. You upgrade to 16GB RAM. Problem solved—for now.

When Vertical Scaling Works

Good for:

  • Early-stage apps (under 10,000 users)
  • Databases (they're hard to split across servers)
  • Quick fixes while you plan for horizontal scaling
  • Applications not designed for multiple servers
Stack Overflow's approach: For years, they handled millions of users on just a handful of very powerful servers. They scaled vertically first because their code wasn't built for horizontal scaling.

The Problem With Vertical Scaling

You hit a ceiling.

You can't infinitely upgrade one server. Eventually, you reach hardware limits.

Other issues:

  • Expensive at the top end: A server with 256GB RAM costs way more than 4 servers with 64GB each
  • Single point of failure: If that one super-server crashes, everything goes down
  • Downtime for upgrades: You have to shut down to add more RAM
  • Limited by physics: There's only so much you can pack into one machine
When you need more: This is when you switch to horizontal scaling.


Horizontal Scaling (Scaling Out)

Simple definition: Adding more servers instead of upgrading one.

Think of it like: Instead of one super-fast delivery driver, you hire 10 regular drivers. More coverage, more reliability.

What you're doing:

  • Running your app on 10 servers instead of 1
  • Each server handles some of the traffic
  • If one fails, the other 9 keep working
Real example: Netflix runs thousands of servers across the globe. If 100 servers crash, you don't even notice—the remaining thousands keep streaming.

When Horizontal Scaling Works

Good for:

  • Apps with 10,000+ users
  • Global applications (users in different countries)
  • Services that need 99.9%+ uptime
  • Unpredictable traffic spikes
Netflix's approach: They don't have one giant server. They have thousands of smaller servers. During peak hours (8 PM when everyone's watching), they automatically add more servers. At 3 AM, they scale back down.

The Catch: Your App Must Be Stateless

Here's where it gets tricky.

Stateful app: Server remembers things about each user.

  • Your login session stored in server memory
  • Your shopping cart saved on a specific server
  • If that server dies, your data is gone
Problem: With 10 servers, user 1 might connect to server A, then their next request goes to server B—which doesn't know who they are.

Stateless app: Server doesn't remember anything.

  • User sends authentication token with every request
  • Session data stored in a separate database/cache (Redis)
  • Shopping cart saved in database, not server memory
  • Any server can handle any request
This is the key to horizontal scaling.

Making Your App Stateless

Bad (Stateful):


User logs in → Server A stores "John is logged in" in memory
User refreshes page → Request goes to Server B → Server B says "Who's John?"

Good (Stateless):


User logs in → Server gives John a token (like a ticket)
John sends token with every request
Any server can verify the token and know it's John

Where to store session data:

  • Redis (in-memory cache)
  • Database
  • JWT tokens (encrypted data sent with each request)

Vertical vs Horizontal: Side by Side

AspectVertical (Scale Up)Horizontal (Scale Out)
CostExpensive at high endCheaper with many servers
ComplexitySimple to implementRequires architecture changes
LimitHardware ceilingNo theoretical limit
ReliabilitySingle point of failureOne server fails, others continue
DowntimeRequired for upgradesCan upgrade without downtime
Best forDatabases, early appsWeb apps, APIs, microservices

The Hybrid Approach (What Real Companies Do)

Most successful companies use both:

Step 1: Start with vertical scaling

  • One powerful server
  • Simple architecture
  • Low operational overhead
Step 2: Add horizontal scaling for app servers
  • Split your app across multiple servers
  • Keep database on one powerful server (vertical)
Step 3: Eventually scale database horizontally
  • Read replicas (horizontal)
  • Sharding (splitting data across servers)
Real example: Shopify

2006 (Launch): One server running everything

2010: Scaled up database vertically (bigger server), scaled out app horizontally (more servers)

2024: Thousands of app servers (horizontal) + sharded databases (horizontal)

They didn't start complex. They evolved.


Real Story: Discord's Scaling Journey

2015 (Launch):

  • Built on Elixir (good for real-time apps)
  • One database server
  • A few app servers
  • Worked great for 10,000 users
2017 (Growth):
  • Scaled app servers horizontally (easy)
  • Database hit limits (vertical scaling maxed out)
  • Had to introduce read replicas
2020 (Massive growth during pandemic):
  • 140 million monthly users
  • Had to shard their database (horizontal scaling)
  • Built custom caching layer
  • Now runs on thousands of servers
The lesson: They scaled up when possible, scaled out when necessary.


When Should You Scale?

Don't scale prematurely.

Scale when you see:

  • Server CPU consistently above 70%
  • Response times getting slower (from 100ms to 500ms)
  • Database queries timing out
  • Memory running out
  • You're losing users due to performance
Don't scale when:
  • Everything works fine
  • You're guessing about future traffic
  • You haven't profiled your code
  • Your issue is bad code, not traffic
Golden rule: Fix inefficient code before adding servers. A slow database query doesn't get faster with more servers.


Your Challenge

Look at your current project (or imagine one):

  • Is it stateless or stateful?
  • Where are you storing user sessions?
  • Could it run on multiple servers right now?
  • What would break if you added a second server?
Write down the answers. This thinking is what separates junior from senior engineers.

Key Takeaways

  • Vertical scaling = bigger server (simple but limited)
  • Horizontal scaling = more servers (complex but unlimited)
  • Stateless apps scale horizontally easily
  • Stateful apps need refactoring first
  • Most companies use both strategies
  • Scale when metrics prove you need it, not before

Next up: Part 3: Load Balancing →

You've got multiple servers now. But how do you distribute traffic between them? That's where load balancers come in.


Written by Amika Deshapriya Making system design simple, one story at a time.

Connect: LinkedIn | GitHub | Newsletter