System Design 101 - Part 5: Database Fundamentals
2025-11-08 • 5 min read

Database Fundamentals for System Design
← Part 4: Caching | Part 5 (You are here)
The Filing Cabinet Problem
Imagine you run a company with customer records.
Option 1: Filing cabinet with folders
- Everything organized in alphabetical order
- Folders for each customer
- Related documents stapled together
- Easy to find "John Smith's" complete file
- Throw all papers in randomly
- Super fast to add new papers
- Nightmare to find anything
- But if you need "all red papers," just dump and sort
Both work. For different problems.
SQL vs NoSQL: The Real Difference
SQL Databases (Structured)
Think of it like: Excel spreadsheets with strict rules.
How it works:
- Data stored in tables (like spreadsheets)
- Each row is a record (one user, one product)
- Columns are properties (name, email, age)
- Tables connect to each other (users have orders, orders have products)
Best for:
- Banking apps (money must be accurate)
- E-commerce (orders relate to users and products)
- Anything where relationships matter
- Data that has consistent structure
- Users table (customer info)
- Orders table (order details)
- Products table (what was bought)
- All connected together
NoSQL Databases (Flexible)
Think of it like: Folders with documents that can look different.
How it works:
- Data stored as documents (like JSON)
- Each document can have different fields
- No strict relationships required
- Very fast to read/write
Best for:
- Social media posts (varies wildly)
- Real-time analytics
- Storing logs
- Data that changes structure often
Post 1: {image, caption, likes, location}
Post 2: {video, caption, likes, tagged_users}
Post 3: {carousel_images, caption, likes}
Each post is different. NoSQL handles this easily.
The ACID Rules (For SQL)
ACID sounds scary. It's not. It's just four promises a database makes:
A = Atomicity (All or Nothing)
Simple version: Either the whole thing happens or none of it happens.
Example: Transferring money
Take $100 from Account A
Give $100 to Account B
If the database crashes after step 1, it rolls back. You don't lose $100.
Why it matters: No half-finished actions. Everything stays consistent.
C = Consistency (Rules Never Break)
Simple version: Data follows all the rules, always.
Example: Age must be a positive number. Email must have an @.
If you try to save invalid data, the database rejects it.
Why it matters: Garbage data can't get in.
I = Isolation (No Interference)
Simple version: Two people doing things at the same time don't mess each other up.
Example:
User A: Read balance ($100) → Add $50 → Save ($150)
User B: Read balance ($100) → Add $30 → Save ($130)
Without isolation, one update gets lost. With isolation, both happen correctly.
Why it matters: Concurrent users don't corrupt data.
D = Durability (Never Lose Data)
Simple version: Once saved, data doesn't disappear—even if server crashes.
Example: You click "Post" on social media. Even if the server immediately crashes, your post is saved.
Why it matters: Power outages don't destroy data.
Database Replication (Making Copies)
The problem: One database server is a single point of failure. If it dies, everything stops.
The solution: Keep copies of your data on multiple servers.
Primary-Replica Setup
How it works:
Primary Database (Main)
↓
Copies data to →
↓
Replica 1, Replica 2, Replica 3
Who does what:
- Primary: Handles all writes (create, update, delete)
- Replicas: Handle reads (just fetching data)
- Most apps read way more than they write (90% reads, 10% writes)
- Spread read load across many replicas
- If primary dies, promote a replica to primary
- All new posts go to primary database
- When you scroll your feed (reading), it comes from replicas
- Millions of reads, thousands of writes
The Trade-off: Eventual Consistency
The issue: Replicas aren't updated instantly. There's a tiny delay (milliseconds to seconds).
What this means:
You post a photo → Saved to primary
Friend refreshes feed → Reads from replica (might not have new photo yet)
2 seconds later → Replica catches up → Friend sees your photo
For social media, this is fine. For banking, it's not.
Database Sharding (Splitting Data)
The problem: Your database is too big for one server. 100 million users won't fit.
The solution: Split data across multiple databases (shards).
How Sharding Works
Example: User database
Shard 1: Users A-M (50 million users)
Shard 2: Users N-Z (50 million users)
When someone requests "John Smith," the app knows to check Shard 1 (because J is between A-M).
Common Sharding Strategies
1. By ID range:
- Users 1-10M → Shard 1
- Users 10M-20M → Shard 2
2. By hash:
- Hash user ID → Gets a number
- Number determines shard
- Evenly distributed
- US users → US database
- EU users → EU database
- Faster (data is closer to users)
- User profiles → Database 1
- User posts → Database 2
- Orders → Database 3
The Downside of Sharding
Complexity: You can't easily join data across shards.
Example:
User "John" is in Shard 1
His orders are in Shard 2
To get "John's orders," you need to query both and combine results
When to shard: Only when you absolutely have to. It adds a lot of complexity.
Choosing the Right Database
Use SQL When:
- Data has clear relationships (users → orders → products)
- Accuracy is critical (banking, inventory)
- You need complex queries (reports, analytics)
- Data structure is consistent
Use NoSQL When:
- Data structure varies (social media posts)
- Need to scale writes massively (logs, analytics)
- Relationships aren't important
- Speed matters more than absolute consistency
Use Both When:
- User accounts: SQL (structured, important)
- User activity logs: NoSQL (high volume, flexible)
- Product catalog: SQL (relationships)
- Product reviews: NoSQL (varies)
Real Story: Uber's Database Strategy
The challenge: Track millions of rides in real time, across 10,000+ cities.
Their solution:
- Ride data: Sharded SQL databases (by city)
- Location tracking: NoSQL (Cassandra)
- User profiles: SQL with replicas
Key lesson: Different data has different needs. Use the right tool for each job.
Common Database Mistakes
Mistake 1: No Indexes
Problem: Database searches through every row (slow).
Fix: Add indexes on columns you search often.
Analogy: Like a book without an index—you have to read every page to find something.
Mistake 2: Not Using Connection Pools
Problem: Opening a new database connection for every request (slow).
Fix: Reuse connections. Keep a pool of 10-20 connections ready.
Mistake 3: Storing Everything in One Table
Problem: Table gets huge, queries get slow.
Fix: Split into related tables. One table for users, one for posts, one for comments.
Mistake 4: Not Planning for Backups
Problem: Database corrupts, no backup, lose everything.
Fix: Automated daily backups. Test restoring from backup.
Mistake 5: Premature Sharding
Problem: Sharding before you actually need it (millions of users).
Fix: Scale vertically first. Add replicas. Shard only when necessary.
Quick Database Checklist
Before launching:
- [ ] Backups configured (automated, tested)
- [ ] Indexes on frequently queried columns
- [ ] Connection pooling enabled
- [ ] Monitoring set up (CPU, disk, query times)
- [ ] At least one replica for redundancy
- [ ] Add read replicas when read load is high
- [ ] Consider caching before adding more databases
- [ ] Shard only when single database can't handle it
- [ ] Monitor slow queries and optimize them
Your Challenge
Think about an app you want to build (or already built):
- What data needs to be stored?
- SQL or NoSQL? Why?
- What relationships exist in your data?
- What happens if your database crashes?
- At what point would you need replicas? Sharding?
Week 1 Complete: What You've Learned
Part 1: Why system design matters (career, performance, cost)
Part 2: Vertical vs horizontal scaling, stateless architecture
Part 3: Load balancing algorithms and high availability
Part 4: Caching strategies that make everything 10x faster
Part 5: Database fundamentals, replication, and sharding
You now understand how apps like Netflix, Instagram, and Uber actually work behind the scenes.
What's Next?
Week 2: System Architecture Patterns (Coming Soon)
- Monolith vs Microservices
- Event-driven architecture
- API design patterns
- When to use each approach
- Real migration stories
- How do you store tweets?
- How do you show user feeds?
- How do you handle millions of users?
- Draw the architecture
Key Takeaways
- SQL = structured, relationships, accurate (use for most things)
- NoSQL = flexible, fast, scalable (use for specific needs)
- ACID = promises that SQL databases make (reliability)
- Replication = copies of data for speed and backup
- Sharding = splitting data across databases (only when necessary)
- Most apps use both SQL and NoSQL for different purposes
- Start simple, scale when metrics prove you need to
- Always have backups and monitoring
Congratulations! You've completed Week 1 of System Design 101.
You now know more about system design than 80% of junior developers.
Written by Amika Deshapriya Making system design simple, one story at a time.
Connect: LinkedIn | GitHub | Newsletter