- Atomic Reactor: $10,000,000 / unit
- Bicycle shed: $2350 / unit
- Coffee Beans: $4.75 /month
Financial Committee Meeting Agenda
Northcote Parkinson:
The time spent on any item of the agenda will be in inverse proportion to the sum [of money] involved.
The Law of Triviality
What is System Design Interview?
Uber
GDrive
YouTube
Discord
Interview Question Examples
-
Design Facebook, Twitter, Instagram, Tiktok
- Design Youtube, Netflix, Twitch
- Design Dropbox, Google Drive or Photos
- Design Bitly or Tinyurl
- Design Rate Limiter or Load Balancer
- Design Yelp or Foursquare
System Design vs Coding Interview
Coding Interview
Requirements: Clear
System Design Interview
Requirements: Intentionally vague
Answers: Objective
(time and space complexity)
Answers: Subjective
(depends on requirements and justification)
Communication: VERY Important
Communication: Important
System Design Interview Assesment
No hire
On Hold
- Wasn't able to take the lead in the discussion.
- Failed to discuss tradeoffs and couldn't justify decisions.
- Jumped right into the design without clarifying requirements.
- During requirements definition, important features were left out and the candidate had to be given a few hints before a fair set of features was defined.
- The candidate identified bottlenecks, but had trouble discussing tradeoffs and missed important risks.
Interview Plan
-
Requirements clarifications, constraints, and assumptions
-
Back-of-the-envelope calculations
-
System interface definition
-
System high-level design
-
Component detailed design
-
Scale the design
-
Summary
40-minute Interview Battle Plan
-
Requirements (4 min)
-
Estimation (3 min)
-
API (3 min)
-
High-level design (5 min)
-
Component design (15 min)
-
Scale / Bottlenecks (5 min)
-
Summary (5 min)
Requirements clarifications
What does the system do?
What kind of a system should it be?
Design Facebook
What does the system do?
Q: What part of Facebook we're going to design?
A: Design Facebook News Feed
Q: What part of Facebook News Feed do we need to design?
Adding new posts? Showing posts from friends/groups? Real-time updates? Likes? Comments?
A: Adding new posts and loading news feed. Ultimately, we want to design
a feed generation and refreshing piece of the data pipeline
What does the system do? (cont.)
Q: Should we handle video and photo uploads?
A: Let's keep it for the v2
Q: Do we need to design any posts ranking system or should they be in chronological order?
A: No need for a ranking system.
Let's have a list of posts in chronological order
Q: What about Ads in the news feed?
A: That'd be nice if we could have them. But it's not a priority for now.
What does the system do? (cont.)
Q: Should we consider implementing also user friend requests and groups membership? Or can we assume this data is already available somewhere?
A: We can assume this data is available in the DB
What kind of a system should it be?
Q: Are we designing a global multi-region service or a regional one?
A: We want this feature to be available globally
Q: How quickly do we need to have an update in a news feed once a post is published?
A: It may depend on location. Users in the same region should see updates in seconds. Users over the ocean may see an update within a minute
What kind of a system should it be? (cont)
Q: What type of availability are we aiming for?
A: I'd say rather durability is important. We can't lose a post once it's created. High Availability is a second priority.
Q: How many daily active users do we have?
A: About 1 billion
Q: How many posts per day do we have?
A: 10 million posts per day
What kind of a system should it be? (cont.)
Q: How many friends does a user has on average?
A: About 500 friends. Of course, there are users that have much more.
Q: How many News Feed views user can do per day? 5-10?
A: Let it be a 5 news feed views per day
Summary
- Design FB News Feed
- No photo/video in posts
- No post ranking. Chronological order
- Ads - nice to have (soft requirement)
- Multi-region (hard requirement)
- Latency (within region): < 1s
- Latency (multi-region): < 60s
- Durability is very important
- Availability less important
- 1B daily active users
- 10M posts per day
- 500 average friends
- 5 news feed views per user per day
Estimation Cheat Sheet
Storage Scale
Character = 1 byte
Metadata = 5-10 KB
1080p Image = 2MB
1080p Video (1 min) = 30MB
Server Capacity
Disk Space = ~ 10TB
RAM = ~256GB - 512GB
Estimates
Facts
1B daily active users (DAU)
5B views per day
10M posts per day
Estimates
Posts DB Storage:
10KB * 10M = 100GB (Daily)
30 * 100GB = 3TB (Monthly)
Read-write ratio:
5B / 10M = 500:1
Throughput:
Read: 5B * 20pst * 10KB = 1PB/day
Write: 10M * 10KB = 100GB/day
RPS:
Read: 5B / (24*60*60) = ~58k rps
Write: 10M /(24*60*60) = ~115 rps
API
createPost()
getNewsFeed(page_id)
- content
- user_id
- group_id
- image_url
- location
- etc.
List of Posts
High-level System Design
System Design Interview
Pre-sale
Discovery
High-level Design
User
High-level Design
Presentation
Logic
Storage
User
API
Application
Database
High-level Design
Presentation
Logic
Storage
YouTube
User
API
Application
Database
High-level Design
Presentation
Logic
Storage
GDrive
User
API
Application
Database
High-level Design
API
Logic
Storage
User
Application
Database
createPost()
API
getNewsFeed()
API
Create Post API
User
Load
Balancer
createPost()
API
Database
App Server
- Round-Robin?
- Sticky?
- RDBMS?
- NoSQL?
Does it all make sense for now?
News Feed API
- Round-Robin?
- Sticky?
User
Load
Balancer
createPost()
API
Database
App Server
User
Load
Balancer
Database
App Server
getNewsFeed()
API
- Round-Robin?
- Sticky?
- Is the same?
- Is the same?
- RDBMS?
- NoSQL?
- Is the same?
- RDBMS?
- NoSQL?
News Feed
Would you like to dive deeper into component design?
Do you see where I'm going here?
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Posts Service
RDBMS
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Detailed Design
- Database Schema and Queries
- Present different approaches (pros/cons)
- Pick a solution and explain tradeoffs
- Address functional requirements
- Check for non-functional requirements violation
- Try to think of any edge-cases
NewsFeed DB Schema
User |
---|
user_id: 1 (PK) |
name: varchar |
email: varchar |
dob: datetime |
created_at: datetime |
Group |
---|
group_id: (PK) |
name: varchar |
description: varchar |
created_at: datetime |
UserFollow |
---|
user_id: int |
follow_group_id: int |
follow_user_id: int |
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
News Feed Query
User |
---|
user_id: 1 (PK) |
name: varchar |
email: varchar |
dob: datetime |
created_at: datetime |
Group |
---|
group_id: (PK) |
name: varchar |
description: varchar |
created_at: datetime |
UserFollow |
---|
user_id: int |
follow_group_id: int |
follow_user_id: int |
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
-- All entities user follows
SELECT follow_user_id, follow_group_id
FROM UserFollow
WHERE user_id = :user_id
-- Posts for News Feed
SELECT Post.*
FROM UserFollow
LEFT JOIN Post on (
Post.user_id = follow_user_id OR
Post.group_id = follow_group_id
)
WHERE user_id = :user_id
ORDER BY created_at DESC
500 friends (AVG)
+10M Post Per Day
News Feed Query
UserFollow |
---|
user_id: int |
follow_group_id: int |
follow_user_id: int |
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
-- Posts for News Feed
(
SELECT Post.*
FROM UserFollow
LEFT JOIN Post on Post.user_id = follow_user_id
WHERE user_id = :user_id
)
UNION
(
SELECT Post.*
FROM UserFollow
LEFT JOIN Post on Post.group_id = follow_group_id
WHERE user_id = :user_id
)
ORDER BY created_at DESC
Requirements
- Design FB News Feed
- No photo/video in posts
- No post ranking. Chronological order
- Ads - nice to have (soft requirement)
- Multi-region (hard requirement)
- Latency (within region): < 1s
- Latency (multi-region): < 60s
- Durability is very important
- Availability less important
- 1B daily active users
- 10M posts per day
- 500 average friends
- 5 news feed views per user per day
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Posts Service
RDBMS
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Performance Mantras
- Don't do it
- Do it, but don't do it again
- Do it less
- Do it later
- Do it when they're not looking
- Do it concurrently
- Do it cheaper
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Posts Service
RDBMS
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Does this strategy make sense before continue trying?
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Message
Queue
Feed Generation
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Message
Queue
Feed Generation
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Message
Queue
Feed Generation
Requirements
- Design FB News Feed
- No photo/video in posts
- No post ranking. Chronological order
- Ads - nice to have (soft requirement)
- Multi-region (hard requirement)
- Latency (within region): < 1s
- Latency (multi-region): < 60s
- Durability is very important
- Availability less important
- 1B daily active users
- 10M posts per day
- 500 average friends
- 5 news feed views per user per day
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Message
Queue
Feed Generation
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Stream or ETL?
Feed Generation
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Streaming
Feed Generation
News Feed
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Moderation
Feed Generation
MQ
Streaming
News Feed Schema
User |
---|
user_id: 1 (PK) |
name: varchar |
email: varchar |
dob: datetime |
created_at: datetime |
Group |
---|
group_id: (PK) |
name: varchar |
description: varchar |
created_at: datetime |
UserFollow |
---|
user_id: int |
follow_group_id: int |
follow_user_id: int |
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
Feed |
---|
feed_item_id: int (PK) |
post_id: int (PK) |
subject: varchar |
content: text |
user_id: int |
posted_user_id: int |
posted_group_id: int |
created_at: datetime |
News Feed Query
-- Posts for News Feed
SELECT *
FROM Feed
WHERE user_id = :user_id
ORDER BY created_at DESC
Feed |
---|
feed_item_id: int (PK) |
post_id: int (PK) |
content: text |
user_id: int |
posted_user_id: int |
posted_group_id: int |
created_at: datetime |
Does make sense for this part of a system?
News Feed Ads
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Posts Service
RDBMS
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
- Round-Robin?
- Sticky?
Streaming
Feed Generation
Ads RTB
Scalability and Bottlenecks
1K
10K
100K
1M
1B
Single Server
Load Balancer
Multiple Servers
Read Replicas
CDN
Cache
Rate-Limits
Stateless Services
DB Sharding
NoSQL
Regional DCs
Messaging
Users Population
News Feed Scaling
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
Round-Robin
Streaming
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
Posts Service
Posts Service
Posts Service
Posts Service
+3TB Monthly
1 Server = 10 TB
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Posts Shard Key
post_id ?
user_id ?
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
RDBMS
RDBMS
RDBMS
RDBMS
db shards
RDBMS
RDBMS
RDBMS
RDBMS
db shards
created_at ?
RDBMS
RDBMS
RDBMS
RDBMS
db shards
- Uniform distribution
- Bad locality
- Non-uniform distribution
- Better data locality
- Worst distribution
- Best data locality
News Feed Scaling
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
Round-Robin
Streaming
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
Feed Generation
Oleksii Petrov
Hello
World
Message Queue
Feed Generation
Follower
Follower
Follower
Follower
Hello
World
Hello
World
Hello World
Hello
World
avg: 500 followers
Feed Generation
Feed Generation
Oleksii
Petrov
Hello
World
Message Queue
Feed Generation
(after the conference)
What's the latency ?
News Feed Scaling
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
Round-Robin
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
avg: ~58k rps
peak: ~580k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
Performance Mantras
- Don't do it
- Do it, but don't do it again
- Do it less
- Do it later
- Do it when they're not looking
- Do it concurrently
- Do it cheaper
News Feed Scaling
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
- RDBMS?
- InMemory?
- NoSQL?
- Round-Robin?
- Sticky?
Round-Robin
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
avg: ~58k rps
peak: ~580k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
Feed DB Estimate
- 1B users
- 10kb per one post
- 500 posts in the DB (assumed)
- Estimated Storage: 5MB * 1B = 5PB
- Servers: 5PB / 10TB = 500
News Feed Scaling
User
Load
Balancer
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Round-Robin
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
avg: ~58k rps
peak: ~580k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
- Round-Robin?
- Sticky?
Feed Service
Feed Shard Key
feed_item_id ?
user_id ?
NoSQL
NoSQL
NoSQL
NoSQL
db shards
NoSQL
NoSQL
NoSQL
NoSQL
db shards
created_at ?
NoSQL
NoSQL
NoSQL
NoSQL
db shards
- Uniform distribution
- Bad locality
- Non-uniform distribution
- Better data locality
- Worst distribution
- Best data locality
Feed |
---|
feed_item_id: int (PK) |
post_id: int (PK) |
content: text |
user_id: int |
posted_user_id: int |
posted_group_id: int |
created_at: datetime |
Hint: Consistent Hashing and virtual nodes
News Feed Scaling
User
Load
Balancer
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Round-Robin
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
avg: ~58k rps
peak: ~580k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Feed Service
Feed Service
Feed Service
Sticky
Requirements
- Design FB News Feed
- No photo/video in posts
- No post ranking. Chronological order
- Ads - nice to have (soft requirement)
- Multi-region (hard requirement)
- Latency (within region): < 1s
- Latency (multi-region): < 60s
- Durability is very important
- Availability less important
- 1B daily active users
- 10M posts per day
- 500 average friends
- 5 news feed views per user per day
News Feed Region
Load
Balancer
Load
Balancer
Round-Robin
Feed Generation
Ads RTB
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Feed Service
Feed Service
Feed Service
Sticky
News Feed Regions
Load
Balancer
Load
Balancer
Round-Robin
Feed Generation
Ads RTB
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Feed Service
Feed Service
Feed Service
Sticky
Feed Generation
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Ads RTB
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Feed Service
Feed Service
Feed Service
Ads RTB
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Feed Service
Feed Service
Feed Service
Requirements
Design FB News FeedNo photo/video in postsNo post ranking. Chronological orderAds - nice to have (soft requirement)Multi-region (hard requirement)Latency (within region): < 1sLatency (multi-region): < 60sDurability is very importantAvailability less important1B daily active users10M posts per day500 average friends5 news feed views per user per day
Have we built the Facebook?
What was the point?
-
Show that you can work with fuzzy requirements
-
Show that you can break down large problems into pieces
-
Show that you can give steady progress with any task
-
Show that when you face a problem you're unfamiliar with,
you won't give up.
Interview Tips
-
Spend at least 4-5 minutes clarifying the requirements.
-
You're the person who explains how everything should be working. Be like presenting a Ted Talk.
-
Start high-level and drill-down.
-
Focus on parts you know best.
-
If you know a solution, forget it! Or at least don't rush.
-
Constantly check in with the interviewer.
-
Constantly verify if you're okay with the requirements.
-
Don't rush with solutions and have a good reason for them.
-
If you're stuck ask for a tip. Better to unstuck and continue, than fail completely.
Useful Stuff
Who am I?
Solutions Acrhitect
Tech Lead
Developer
Find me on
linkedin.com/in/alexhelkar
facebook.com/alexhelkar
Questions?
System Design Interview
By Oleksii Petrov
System Design Interview
- 2,154