Back to Blog

System Design 101 - Part 5: Database Fundamentals

2025-11-085 min read

AD
By Amika Deshapriya
System Design 101 • Part 5
System Design 101 - Part 5: Database Fundamentals

Database Fundamentals for System Design

← Part 4: Caching | Part 5 (You are here)


The Filing Cabinet Problem

Imagine you run a company with customer records.

Option 1: Filing cabinet with folders

  • Everything organized in alphabetical order
  • Folders for each customer
  • Related documents stapled together
  • Easy to find "John Smith's" complete file
Option 2: Giant box of loose papers
  • Throw all papers in randomly
  • Super fast to add new papers
  • Nightmare to find anything
  • But if you need "all red papers," just dump and sort
Option 1 is SQL databases. Option 2 is NoSQL databases.

Both work. For different problems.


SQL vs NoSQL: The Real Difference

SQL Databases (Structured)

Think of it like: Excel spreadsheets with strict rules.

How it works:

  • Data stored in tables (like spreadsheets)
  • Each row is a record (one user, one product)
  • Columns are properties (name, email, age)
  • Tables connect to each other (users have orders, orders have products)
Examples: PostgreSQL, MySQL, SQL Server

Best for:

  • Banking apps (money must be accurate)
  • E-commerce (orders relate to users and products)
  • Anything where relationships matter
  • Data that has consistent structure
Real example: Amazon orders
  • Users table (customer info)
  • Orders table (order details)
  • Products table (what was bought)
  • All connected together

NoSQL Databases (Flexible)

Think of it like: Folders with documents that can look different.

How it works:

  • Data stored as documents (like JSON)
  • Each document can have different fields
  • No strict relationships required
  • Very fast to read/write
Examples: MongoDB, DynamoDB, Cassandra

Best for:

  • Social media posts (varies wildly)
  • Real-time analytics
  • Storing logs
  • Data that changes structure often
Real example: Instagram posts

Post 1: {image, caption, likes, location}
Post 2: {video, caption, likes, tagged_users}
Post 3: {carousel_images, caption, likes}

Each post is different. NoSQL handles this easily.


The ACID Rules (For SQL)

ACID sounds scary. It's not. It's just four promises a database makes:

A = Atomicity (All or Nothing)

Simple version: Either the whole thing happens or none of it happens.

Example: Transferring money


Take $100 from Account A
Give $100 to Account B

If the database crashes after step 1, it rolls back. You don't lose $100.

Why it matters: No half-finished actions. Everything stays consistent.

C = Consistency (Rules Never Break)

Simple version: Data follows all the rules, always.

Example: Age must be a positive number. Email must have an @.

If you try to save invalid data, the database rejects it.

Why it matters: Garbage data can't get in.

I = Isolation (No Interference)

Simple version: Two people doing things at the same time don't mess each other up.

Example:


User A: Read balance ($100) → Add $50 → Save ($150)
User B: Read balance ($100) → Add $30 → Save ($130)

Without isolation, one update gets lost. With isolation, both happen correctly.

Why it matters: Concurrent users don't corrupt data.

D = Durability (Never Lose Data)

Simple version: Once saved, data doesn't disappear—even if server crashes.

Example: You click "Post" on social media. Even if the server immediately crashes, your post is saved.

Why it matters: Power outages don't destroy data.


Database Replication (Making Copies)

The problem: One database server is a single point of failure. If it dies, everything stops.

The solution: Keep copies of your data on multiple servers.

Primary-Replica Setup

How it works:


Primary Database (Main)
  ↓
Copies data to →
  ↓
Replica 1, Replica 2, Replica 3

Who does what:

  • Primary: Handles all writes (create, update, delete)
  • Replicas: Handle reads (just fetching data)
Why this works:
  • Most apps read way more than they write (90% reads, 10% writes)
  • Spread read load across many replicas
  • If primary dies, promote a replica to primary
Real example: Instagram
  • All new posts go to primary database
  • When you scroll your feed (reading), it comes from replicas
  • Millions of reads, thousands of writes

The Trade-off: Eventual Consistency

The issue: Replicas aren't updated instantly. There's a tiny delay (milliseconds to seconds).

What this means:


You post a photo → Saved to primary
Friend refreshes feed → Reads from replica (might not have new photo yet)
2 seconds later → Replica catches up → Friend sees your photo

For social media, this is fine. For banking, it's not.


Database Sharding (Splitting Data)

The problem: Your database is too big for one server. 100 million users won't fit.

The solution: Split data across multiple databases (shards).

How Sharding Works

Example: User database


Shard 1: Users A-M (50 million users)
Shard 2: Users N-Z (50 million users)

When someone requests "John Smith," the app knows to check Shard 1 (because J is between A-M).

Common Sharding Strategies

1. By ID range:

  • Users 1-10M → Shard 1
  • Users 10M-20M → Shard 2
Problem: New users all go to latest shard (unbalanced)

2. By hash:

  • Hash user ID → Gets a number
  • Number determines shard
  • Evenly distributed
3. By geography:
  • US users → US database
  • EU users → EU database
  • Faster (data is closer to users)
4. By feature:
  • User profiles → Database 1
  • User posts → Database 2
  • Orders → Database 3

The Downside of Sharding

Complexity: You can't easily join data across shards.

Example:


User "John" is in Shard 1
His orders are in Shard 2
To get "John's orders," you need to query both and combine results

When to shard: Only when you absolutely have to. It adds a lot of complexity.


Choosing the Right Database

Use SQL When:

  • Data has clear relationships (users → orders → products)
  • Accuracy is critical (banking, inventory)
  • You need complex queries (reports, analytics)
  • Data structure is consistent
Examples: E-commerce, banking, booking systems

Use NoSQL When:

  • Data structure varies (social media posts)
  • Need to scale writes massively (logs, analytics)
  • Relationships aren't important
  • Speed matters more than absolute consistency
Examples: Social media feeds, IoT sensor data, real-time analytics

Use Both When:

  • User accounts: SQL (structured, important)
  • User activity logs: NoSQL (high volume, flexible)
  • Product catalog: SQL (relationships)
  • Product reviews: NoSQL (varies)
Real companies do this: Netflix, Uber, Airbnb all use multiple database types.

Real Story: Uber's Database Strategy

The challenge: Track millions of rides in real time, across 10,000+ cities.

Their solution:

  • Ride data: Sharded SQL databases (by city)
- Each city is a shard - Rides in New York → New York database - Rides in London → London database
  • Location tracking: NoSQL (Cassandra)
- Millions of GPS updates per second - Don't need to be perfectly accurate - Need to be fast

  • User profiles: SQL with replicas
- Important, structured data - Read-heavy (people check their profile often) - Replicas handle the load

Key lesson: Different data has different needs. Use the right tool for each job.


Common Database Mistakes

Mistake 1: No Indexes

Problem: Database searches through every row (slow).

Fix: Add indexes on columns you search often.

Analogy: Like a book without an index—you have to read every page to find something.

Mistake 2: Not Using Connection Pools

Problem: Opening a new database connection for every request (slow).

Fix: Reuse connections. Keep a pool of 10-20 connections ready.

Mistake 3: Storing Everything in One Table

Problem: Table gets huge, queries get slow.

Fix: Split into related tables. One table for users, one for posts, one for comments.

Mistake 4: Not Planning for Backups

Problem: Database corrupts, no backup, lose everything.

Fix: Automated daily backups. Test restoring from backup.

Mistake 5: Premature Sharding

Problem: Sharding before you actually need it (millions of users).

Fix: Scale vertically first. Add replicas. Shard only when necessary.


Quick Database Checklist

Before launching:

  • [ ] Backups configured (automated, tested)
  • [ ] Indexes on frequently queried columns
  • [ ] Connection pooling enabled
  • [ ] Monitoring set up (CPU, disk, query times)
  • [ ] At least one replica for redundancy
When you grow:
  • [ ] Add read replicas when read load is high
  • [ ] Consider caching before adding more databases
  • [ ] Shard only when single database can't handle it
  • [ ] Monitor slow queries and optimize them

Your Challenge

Think about an app you want to build (or already built):

  • What data needs to be stored?
  • SQL or NoSQL? Why?
  • What relationships exist in your data?
  • What happens if your database crashes?
  • At what point would you need replicas? Sharding?
Write it down. This is the thinking that makes good architects.

Week 1 Complete: What You've Learned

Part 1: Why system design matters (career, performance, cost)

Part 2: Vertical vs horizontal scaling, stateless architecture

Part 3: Load balancing algorithms and high availability

Part 4: Caching strategies that make everything 10x faster

Part 5: Database fundamentals, replication, and sharding

You now understand how apps like Netflix, Instagram, and Uber actually work behind the scenes.


What's Next?

Week 2: System Architecture Patterns (Coming Soon)

  • Monolith vs Microservices
  • Event-driven architecture
  • API design patterns
  • When to use each approach
  • Real migration stories
Practice challenge: Design a simplified Twitter:
  • How do you store tweets?
  • How do you show user feeds?
  • How do you handle millions of users?
  • Draw the architecture
Share your solution—I'd love to see what you come up with!


Key Takeaways

  • SQL = structured, relationships, accurate (use for most things)
  • NoSQL = flexible, fast, scalable (use for specific needs)
  • ACID = promises that SQL databases make (reliability)
  • Replication = copies of data for speed and backup
  • Sharding = splitting data across databases (only when necessary)
  • Most apps use both SQL and NoSQL for different purposes
  • Start simple, scale when metrics prove you need to
  • Always have backups and monitoring

Congratulations! You've completed Week 1 of System Design 101.

You now know more about system design than 80% of junior developers.


Written by Amika Deshapriya Making system design simple, one story at a time.

Connect: LinkedIn | GitHub | Newsletter