Database Fundamentals for System Design

← Part 4: Caching | Part 5 (You are here)

The Filing Cabinet Problem

Imagine you run a company with customer records.

Option 1: Filing cabinet with folders

Everything organized in alphabetical order
Folders for each customer
Related documents stapled together
Easy to find "John Smith's" complete file

Option 2: Giant box of loose papers

Throw all papers in randomly
Super fast to add new papers
Nightmare to find anything
But if you need "all red papers," just dump and sort

Option 1 is SQL databases. Option 2 is NoSQL databases.

Both work. For different problems.

SQL vs NoSQL: The Real Difference

SQL Databases (Structured)

Think of it like: Excel spreadsheets with strict rules.

How it works:

Data stored in tables (like spreadsheets)
Each row is a record (one user, one product)
Columns are properties (name, email, age)
Tables connect to each other (users have orders, orders have products)

Examples: PostgreSQL, MySQL, SQL Server

Best for:

Banking apps (money must be accurate)
E-commerce (orders relate to users and products)
Anything where relationships matter
Data that has consistent structure

Real example: Amazon orders

Users table (customer info)
Orders table (order details)
Products table (what was bought)
All connected together

NoSQL Databases (Flexible)

Think of it like: Folders with documents that can look different.

How it works:

Data stored as documents (like JSON)
Each document can have different fields
No strict relationships required
Very fast to read/write

Examples: MongoDB, DynamoDB, Cassandra

Best for:

Social media posts (varies wildly)
Real-time analytics
Storing logs
Data that changes structure often

Real example: Instagram posts


Post 1: {image, caption, likes, location}
Post 2: {video, caption, likes, tagged_users}
Post 3: {carousel_images, caption, likes}

Each post is different. NoSQL handles this easily.

The ACID Rules (For SQL)

ACID sounds scary. It's not. It's just four promises a database makes:

A = Atomicity (All or Nothing)

Simple version: Either the whole thing happens or none of it happens.

Example: Transferring money


Take $100 from Account A
Give $100 to Account B

If the database crashes after step 1, it rolls back. You don't lose $100.

Why it matters: No half-finished actions. Everything stays consistent.

C = Consistency (Rules Never Break)

Simple version: Data follows all the rules, always.

Example: Age must be a positive number. Email must have an @.

If you try to save invalid data, the database rejects it.

Why it matters: Garbage data can't get in.

I = Isolation (No Interference)

Simple version: Two people doing things at the same time don't mess each other up.

Example:


User A: Read balance ($100) → Add $50 → Save ($150)
User B: Read balance ($100) → Add $30 → Save ($130)

Without isolation, one update gets lost. With isolation, both happen correctly.

Why it matters: Concurrent users don't corrupt data.

D = Durability (Never Lose Data)

Simple version: Once saved, data doesn't disappear—even if server crashes.

Example: You click "Post" on social media. Even if the server immediately crashes, your post is saved.

Why it matters: Power outages don't destroy data.

Database Replication (Making Copies)

The problem: One database server is a single point of failure. If it dies, everything stops.

The solution: Keep copies of your data on multiple servers.

Primary-Replica Setup

How it works:


Primary Database (Main)
  ↓
Copies data to →
  ↓
Replica 1, Replica 2, Replica 3

Who does what:

Primary: Handles all writes (create, update, delete)
Replicas: Handle reads (just fetching data)

Why this works:

Most apps read way more than they write (90% reads, 10% writes)
Spread read load across many replicas
If primary dies, promote a replica to primary

Real example: Instagram

All new posts go to primary database
When you scroll your feed (reading), it comes from replicas
Millions of reads, thousands of writes

The Trade-off: Eventual Consistency

The issue: Replicas aren't updated instantly. There's a tiny delay (milliseconds to seconds).

What this means:


You post a photo → Saved to primary
Friend refreshes feed → Reads from replica (might not have new photo yet)
2 seconds later → Replica catches up → Friend sees your photo

For social media, this is fine. For banking, it's not.

Database Sharding (Splitting Data)

The problem: Your database is too big for one server. 100 million users won't fit.

The solution: Split data across multiple databases (shards).

How Sharding Works

Example: User database


Shard 1: Users A-M (50 million users)
Shard 2: Users N-Z (50 million users)

When someone requests "John Smith," the app knows to check Shard 1 (because J is between A-M).

Common Sharding Strategies

1. By ID range:

Users 1-10M → Shard 1
Users 10M-20M → Shard 2

Problem: New users all go to latest shard (unbalanced)

2. By hash:

Hash user ID → Gets a number
Number determines shard
Evenly distributed

3. By geography:

US users → US database
EU users → EU database
Faster (data is closer to users)

4. By feature:

User profiles → Database 1
User posts → Database 2
Orders → Database 3

The Downside of Sharding

Complexity: You can't easily join data across shards.

Example:


User "John" is in Shard 1
His orders are in Shard 2
To get "John's orders," you need to query both and combine results

When to shard: Only when you absolutely have to. It adds a lot of complexity.

Choosing the Right Database

Use SQL When:

Data has clear relationships (users → orders → products)
Accuracy is critical (banking, inventory)
You need complex queries (reports, analytics)
Data structure is consistent

Examples: E-commerce, banking, booking systems

Use NoSQL When:

Data structure varies (social media posts)
Need to scale writes massively (logs, analytics)
Relationships aren't important
Speed matters more than absolute consistency

Examples: Social media feeds, IoT sensor data, real-time analytics

Use Both When:

User accounts: SQL (structured, important)
User activity logs: NoSQL (high volume, flexible)
Product catalog: SQL (relationships)
Product reviews: NoSQL (varies)

Real companies do this: Netflix, Uber, Airbnb all use multiple database types.

Real Story: Uber's Database Strategy

The challenge: Track millions of rides in real time, across 10,000+ cities.

Their solution:

Ride data: Sharded SQL databases (by city)

- Each city is a shard - Rides in New York → New York database - Rides in London → London database

Location tracking: NoSQL (Cassandra)

- Millions of GPS updates per second - Don't need to be perfectly accurate - Need to be fast

User profiles: SQL with replicas

- Important, structured data - Read-heavy (people check their profile often) - Replicas handle the load

Key lesson: Different data has different needs. Use the right tool for each job.

Common Database Mistakes

Mistake 1: No Indexes

Problem: Database searches through every row (slow).

Fix: Add indexes on columns you search often.

Analogy: Like a book without an index—you have to read every page to find something.

Mistake 2: Not Using Connection Pools

Problem: Opening a new database connection for every request (slow).

Fix: Reuse connections. Keep a pool of 10-20 connections ready.

Mistake 3: Storing Everything in One Table

Problem: Table gets huge, queries get slow.

Fix: Split into related tables. One table for users, one for posts, one for comments.

Mistake 4: Not Planning for Backups

Problem: Database corrupts, no backup, lose everything.

Fix: Automated daily backups. Test restoring from backup.

Mistake 5: Premature Sharding

Problem: Sharding before you actually need it (millions of users).

Fix: Scale vertically first. Add replicas. Shard only when necessary.

Quick Database Checklist

Before launching:

[ ] Backups configured (automated, tested)
[ ] Indexes on frequently queried columns
[ ] Connection pooling enabled
[ ] Monitoring set up (CPU, disk, query times)
[ ] At least one replica for redundancy

When you grow:

[ ] Add read replicas when read load is high
[ ] Consider caching before adding more databases
[ ] Shard only when single database can't handle it
[ ] Monitor slow queries and optimize them

Your Challenge

Think about an app you want to build (or already built):

What data needs to be stored?
SQL or NoSQL? Why?
What relationships exist in your data?
What happens if your database crashes?
At what point would you need replicas? Sharding?

Write it down. This is the thinking that makes good architects.

Week 1 Complete: What You've Learned

Part 1: Why system design matters (career, performance, cost)

Part 2: Vertical vs horizontal scaling, stateless architecture

Part 3: Load balancing algorithms and high availability

Part 4: Caching strategies that make everything 10x faster

Part 5: Database fundamentals, replication, and sharding

You now understand how apps like Netflix, Instagram, and Uber actually work behind the scenes.

What's Next?

Week 2: System Architecture Patterns (Coming Soon)

Monolith vs Microservices
Event-driven architecture
API design patterns
When to use each approach
Real migration stories

Practice challenge: Design a simplified Twitter:

How do you store tweets?
How do you show user feeds?
How do you handle millions of users?
Draw the architecture

Share your solution—I'd love to see what you come up with!

Key Takeaways

SQL = structured, relationships, accurate (use for most things)
NoSQL = flexible, fast, scalable (use for specific needs)
ACID = promises that SQL databases make (reliability)
Replication = copies of data for speed and backup
Sharding = splitting data across databases (only when necessary)
Most apps use both SQL and NoSQL for different purposes
Start simple, scale when metrics prove you need to
Always have backups and monitoring

Congratulations! You've completed Week 1 of System Design 101.

You now know more about system design than 80% of junior developers.

Written by Amika Deshapriya Making system design simple, one story at a time.

Connect: LinkedIn | GitHub | Newsletter