Financial Committee Meeting Agenda
The time spent on any item of the agenda will be in inverse proportion to the sum [of money] involved.
Uber
GDrive
YouTube
Discord
Design Facebook, Twitter, Instagram, Tiktok
Requirements: Clear
Requirements: Intentionally vague
Answers: Objective
(time and space complexity)
Answers: Subjective
(depends on requirements and justification)
Communication: VERY Important
Communication: Important
Requirements clarifications, constraints, and assumptions
Back-of-the-envelope calculations
System interface definition
System high-level design
Component detailed design
Scale the design
Summary
Requirements (4 min)
Estimation (3 min)
API (3 min)
High-level design (5 min)
Component design (15 min)
Scale / Bottlenecks (5 min)
Summary (5 min)
What does the system do?
What kind of a system should it be?
What does the system do?
Q: What part of Facebook we're going to design?
A: Design Facebook News Feed
Q: What part of Facebook News Feed do we need to design?
Adding new posts? Showing posts from friends/groups? Real-time updates? Likes? Comments?
A: Adding new posts and loading news feed. Ultimately, we want to design
a feed generation and refreshing piece of the data pipeline
Q: Should we handle video and photo uploads?
A: Let's keep it for the v2
Q: Do we need to design any posts ranking system or should they be in chronological order?
A: No need for a ranking system.
Let's have a list of posts in chronological order
Q: What about Ads in the news feed?
A: That'd be nice if we could have them. But it's not a priority for now.
Q: Should we consider implementing also user friend requests and groups membership? Or can we assume this data is already available somewhere?
A: We can assume this data is available in the DB
Q: Are we designing a global multi-region service or a regional one?
A: We want this feature to be available globally
Q: How quickly do we need to have an update in a news feed once a post is published?
A: It may depend on location. Users in the same region should see updates in seconds. Users over the ocean may see an update within a minute
Q: What type of availability are we aiming for?
A: I'd say rather durability is important. We can't lose a post once it's created. High Availability is a second priority.
Q: How many daily active users do we have?
A: About 1 billion
Q: How many posts per day do we have?
A: 10 million posts per day
Q: How many friends does a user has on average?
A: About 500 friends. Of course, there are users that have much more.
Q: How many News Feed views user can do per day? 5-10?
A: Let it be a 5 news feed views per day
Character = 1 byte
Metadata = 5-10 KB
1080p Image = 2MB
1080p Video (1 min) = 30MB
Disk Space = ~ 10TB
RAM = ~256GB - 512GB
1B daily active users (DAU)
5B views per day
10M posts per day
Posts DB Storage:
10KB * 10M = 100GB (Daily)
30 * 100GB = 3TB (Monthly)
Read-write ratio:
5B / 10M = 500:1
Throughput:
Read: 5B * 20pst * 10KB = 1PB/day
Write: 10M * 10KB = 100GB/day
RPS:
Read: 5B / (24*60*60) = ~58k rps
Write: 10M /(24*60*60) = ~115 rps
createPost()
getNewsFeed(page_id)
List of Posts
System Design Interview
Pre-sale
Discovery
User
Presentation
Logic
Storage
User
API
Application
Database
Presentation
Logic
Storage
User
API
Application
Database
Presentation
Logic
Storage
User
API
Application
Database
API
Logic
Storage
User
Application
Database
createPost()
API
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
App Server
Does it all make sense for now?
User
Load
Balancer
createPost()
API
Database
App Server
User
Load
Balancer
Database
App Server
getNewsFeed()
API
Would you like to dive deeper into component design?
Do you see where I'm going here?
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Posts Service
RDBMS
User |
---|
user_id: 1 (PK) |
name: varchar |
email: varchar |
dob: datetime |
created_at: datetime |
Group |
---|
group_id: (PK) |
name: varchar |
description: varchar |
created_at: datetime |
UserFollow |
---|
user_id: int |
follow_group_id: int |
follow_user_id: int |
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
User |
---|
user_id: 1 (PK) |
name: varchar |
email: varchar |
dob: datetime |
created_at: datetime |
Group |
---|
group_id: (PK) |
name: varchar |
description: varchar |
created_at: datetime |
UserFollow |
---|
user_id: int |
follow_group_id: int |
follow_user_id: int |
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
-- All entities user follows
SELECT follow_user_id, follow_group_id
FROM UserFollow
WHERE user_id = :user_id
-- Posts for News Feed
SELECT Post.*
FROM UserFollow
LEFT JOIN Post on (
Post.user_id = follow_user_id OR
Post.group_id = follow_group_id
)
WHERE user_id = :user_id
ORDER BY created_at DESC
500 friends (AVG)
+10M Post Per Day
UserFollow |
---|
user_id: int |
follow_group_id: int |
follow_user_id: int |
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
-- Posts for News Feed
(
SELECT Post.*
FROM UserFollow
LEFT JOIN Post on Post.user_id = follow_user_id
WHERE user_id = :user_id
)
UNION
(
SELECT Post.*
FROM UserFollow
LEFT JOIN Post on Post.group_id = follow_group_id
WHERE user_id = :user_id
)
ORDER BY created_at DESC
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Posts Service
RDBMS
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Posts Service
RDBMS
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
Message
Queue
Feed Generation
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
Message
Queue
Feed Generation
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
Message
Queue
Feed Generation
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
Message
Queue
Feed Generation
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
Stream or ETL?
Feed Generation
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
Streaming
Feed Generation
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Posts Service
Database
Moderation
Feed Generation
MQ
Streaming
User |
---|
user_id: 1 (PK) |
name: varchar |
email: varchar |
dob: datetime |
created_at: datetime |
Group |
---|
group_id: (PK) |
name: varchar |
description: varchar |
created_at: datetime |
UserFollow |
---|
user_id: int |
follow_group_id: int |
follow_user_id: int |
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
Feed |
---|
feed_item_id: int (PK) |
post_id: int (PK) |
subject: varchar |
content: text |
user_id: int |
posted_user_id: int |
posted_group_id: int |
created_at: datetime |
-- Posts for News Feed
SELECT *
FROM Feed
WHERE user_id = :user_id
ORDER BY created_at DESC
Feed |
---|
feed_item_id: int (PK) |
post_id: int (PK) |
content: text |
user_id: int |
posted_user_id: int |
posted_group_id: int |
created_at: datetime |
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Posts Service
RDBMS
Database
Streaming
Feed Generation
Ads RTB
1K
10K
100K
1M
1B
Single Server
Load Balancer
Multiple Servers
Read Replicas
CDN
Cache
Rate-Limits
Stateless Services
DB Sharding
NoSQL
Regional DCs
Messaging
Users Population
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
RDBMS
Database
Round-Robin
Streaming
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
Posts Service
Posts Service
Posts Service
Posts Service
+3TB Monthly
1 Server = 10 TB
RDBMS
RDBMS
RDBMS
RDBMS
db shards
post_id ?
user_id ?
Post |
---|
post_id: int (PK) |
content: text |
user_id: int |
group_id: int |
created_at: datetime |
RDBMS
RDBMS
RDBMS
RDBMS
db shards
RDBMS
RDBMS
RDBMS
RDBMS
db shards
created_at ?
RDBMS
RDBMS
RDBMS
RDBMS
db shards
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Round-Robin
Streaming
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
Oleksii Petrov
Hello
World
Message Queue
Feed Generation
Follower
Follower
Follower
Follower
Hello
World
Hello
World
Hello World
Hello
World
avg: 500 followers
Oleksii
Petrov
Hello
World
Message Queue
Feed Generation
(after the conference)
What's the latency ?
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Round-Robin
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
avg: ~58k rps
peak: ~580k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
User
Load
Balancer
Feed Service
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Database
Round-Robin
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
avg: ~58k rps
peak: ~580k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
User
Load
Balancer
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Round-Robin
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
avg: ~58k rps
peak: ~580k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Feed Service
feed_item_id ?
user_id ?
NoSQL
NoSQL
NoSQL
NoSQL
db shards
NoSQL
NoSQL
NoSQL
NoSQL
db shards
created_at ?
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Feed |
---|
feed_item_id: int (PK) |
post_id: int (PK) |
content: text |
user_id: int |
posted_user_id: int |
posted_group_id: int |
created_at: datetime |
Hint: Consistent Hashing and virtual nodes
User
Load
Balancer
User
getNewsFeed()
API
User
Load
Balancer
createPost()
API
Round-Robin
Feed Generation
Ads RTB
avg: ~100 rps
peak: ~1k rps
avg: ~58k rps
peak: ~580k rps
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Feed Service
Feed Service
Feed Service
Sticky
Load
Balancer
Load
Balancer
Round-Robin
Feed Generation
Ads RTB
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Feed Service
Feed Service
Feed Service
Sticky
Load
Balancer
Load
Balancer
Round-Robin
Feed Generation
Ads RTB
Posts Service
Posts Service
Posts Service
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Feed Service
Feed Service
Feed Service
Sticky
Feed Generation
Debezium
Kafka
streaming
NoSQL
NoSQL
NoSQL
NoSQL
db shards
Ads RTB
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Feed Service
Feed Service
Feed Service
Ads RTB
RDBMS
RDBMS
RDBMS
RDBMS
db shards
Feed Service
Feed Service
Feed Service
Show that you can work with fuzzy requirements
Show that you can break down large problems into pieces
Show that you can give steady progress with any task
Show that when you face a problem you're unfamiliar with,
you won't give up.
Spend at least 4-5 minutes clarifying the requirements.
You're the person who explains how everything should be working. Be like presenting a Ted Talk.
Start high-level and drill-down.
Focus on parts you know best.
If you know a solution, forget it! Or at least don't rush.
Constantly check in with the interviewer.
Constantly verify if you're okay with the requirements.
Don't rush with solutions and have a good reason for them.
If you're stuck ask for a tip. Better to unstuck and continue, than fail completely.
Solutions Acrhitect
Tech Lead
Developer
linkedin.com/in/alexhelkar
facebook.com/alexhelkar