1. Atomic Reactor:    $10,000,000 / unit
  2. Bicycle shed:          $2350 / unit
  3. Coffee Beans:         $4.75 /month

Financial Committee Meeting Agenda

Northcote Parkinson:

The time spent on any item of the agenda will be in inverse proportion to the sum [of money] involved.

The Law of Triviality

What is System Design Interview?

Uber

GDrive

Facebook

YouTube

Discord

Interview Question Examples

  • Design Facebook, Twitter, Instagram, Tiktok

  • Design Youtube, Netflix, Twitch
  • Design Dropbox, Google Drive or Photos
  • Design Bitly or Tinyurl
  • Design Rate Limiter or Load Balancer
  • Design Yelp or Foursquare

System Design vs Coding Interview

Coding Interview

Requirements: Clear

System Design Interview

Requirements: Intentionally vague

Answers: Objective

(time and space complexity)

Answers: Subjective

(depends on requirements and justification)

Communication: VERY Important  

Communication: Important  

System Design Interview Assesment

No hire

On Hold

  • Wasn't able to take the lead in the discussion.
  • Failed to discuss tradeoffs and couldn't justify decisions.
  • Jumped right into the design without clarifying requirements.
  • During requirements definition, important features were left out and the candidate had to be given a few hints before a fair set of features was defined.
  • The candidate identified bottlenecks, but had trouble discussing tradeoffs and missed important risks.

Interview Plan

  1. Requirements clarifications, constraints, and assumptions

  2. Back-of-the-envelope calculations

  3. System interface definition

  4. System high-level design

  5. Component detailed design

  6. Scale the design

  7. Summary

40-minute Interview Battle Plan

  1. Requirements (4 min)

  2. Estimation (3 min)

  3. API (3 min)

  4. High-level design (5 min)

  5. Component design (15 min)

  6. Scale / Bottlenecks (5 min)

  7. Summary (5 min)

Requirements clarifications

What does the system do?

What kind of a system should it be?

Design Facebook

What does the system do?

Q: What part of Facebook we're going to design?

A: Design Facebook News Feed

Q: What part of Facebook News Feed do we need to design?

Adding new posts? Showing posts from friends/groups? Real-time updates? Likes? Comments?

A: Adding new posts and loading news feed. Ultimately, we want to design

a feed generation and refreshing piece of the data pipeline

 

What does the system do? (cont.)

Q: Should we handle video and photo uploads?

A: Let's keep it for the v2

Q: Do we need to design any posts ranking system or should they be in chronological order?

A: No need for a ranking system.

Let's have a list of posts in chronological order

Q: What about Ads in the news feed?

A: That'd be nice if we could have them. But it's not a priority for now. 

 

What does the system do? (cont.)

Q: Should we consider implementing also user friend requests and groups membership? Or can we assume this data is already available somewhere?

A: We can assume this data is available in the DB

What kind of a system should it be?

Q: Are we designing a global multi-region service or a regional one?

A: We want this feature to be available globally

Q: How quickly do we need to have an update in a news feed once a post is published?

A: It may depend on location. Users in the same region should see updates in seconds. Users over the ocean may see an update within a minute

 

What kind of a system should it be? (cont)

Q: What type of availability are we aiming for?

A: I'd say rather durability is important. We can't lose a post once it's created. High Availability is a second priority.

Q: How many daily active users do we have?

A: About 1 billion

Q: How many posts per day do we have?

A: 10 million posts per day

What kind of a system should it be? (cont.)

Q: How many friends does a user has on average?

A: About 500 friends. Of course, there are users that have much more.

Q: How many News Feed views user can do per day? 5-10?

A: Let it be a 5 news feed views per day

Summary

  • Design FB News Feed
  • No photo/video in posts
  • No post ranking. Chronological order
  • Ads - nice to have (soft requirement)
  • Multi-region (hard requirement)
  • Latency (within region): < 1s
  • Latency (multi-region): < 60s
  • Durability is very important
  • Availability less important 
  • 1B daily active users
  • 10M posts per day
  • 500 average friends
  • 5 news feed views per user per day

Estimation Cheat Sheet

Storage Scale

Character = 1 byte

Metadata = 5-10 KB

1080p Image = 2MB

1080p Video (1 min) = 30MB

Server Capacity

Disk Space = ~ 10TB

RAM = ~256GB - 512GB

Estimates

Facts

1B daily active users (DAU)

5B views per day

10M posts per day

Estimates

Posts DB Storage:

10KB * 10M = 100GB (Daily)

30 * 100GB = 3TB (Monthly)

Read-write ratio:

5B / 10M = 500:1

Throughput:

Read: 5B * 20pst * 10KB = 1PB/day 

Write: 10M * 10KB = 100GB/day

RPS:

Read: 5B / (24*60*60) = ~58k rps

Write: 10M /(24*60*60) = ~115 rps

API

createPost()

getNewsFeed(page_id)

  • content
  • user_id
  • group_id
  • image_url
  • location
  • etc.

List of Posts

High-level System Design

System Design Interview

Pre-sale

Discovery

High-level Design

Facebook

User

FACEBOOK

High-level Design

Presentation

Logic

Storage

Facebook

User

API

Application

Database

High-level Design

Presentation

Logic

Storage

YouTube

User

API

Application

Database

High-level Design

Presentation

Logic

Storage

GDrive

User

API

Application

Database

High-level Design

API

Logic

Storage

Facebook

User

Application

Database

createPost()

API

getNewsFeed()

 API

Create Post API

Facebook

User

Load

Balancer

createPost()

API

Database

App Server

  • Round-Robin?
  • Sticky?
  • RDBMS?
  • NoSQL?

Does it all make sense for now?

News Feed API

Facebook

  • Round-Robin?
  • Sticky?

User

Load

Balancer

createPost()

API

Database

App Server

User

Load

Balancer

Database

App Server

getNewsFeed()

API

  • Round-Robin?
  • Sticky?
  • Is the same?
  • Is the same?
  • RDBMS?
  • NoSQL?
  • Is the same?
  • RDBMS?
  • NoSQL?

News Feed

Facebook

Would you like to dive deeper into component design?

 Do you see where I'm going here?

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Database

Posts Service

RDBMS

  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Detailed Design

  • Database Schema and Queries
  • Present different approaches (pros/cons)
  • Pick a solution and explain tradeoffs
  • Address functional requirements
  • Check for non-functional requirements violation
  • Try to think of any edge-cases

 NewsFeed DB Schema

Facebook

User
user_id: 1 (PK)
name: varchar
email: varchar
dob: datetime
created_at: datetime
Group
group_id: (PK)
name: varchar
description: varchar
created_at: datetime
UserFollow
user_id: int
follow_group_id: int
follow_user_id: int
Post
post_id: int (PK)
content: text
user_id: int
group_id: int
created_at: datetime

News Feed Query

Facebook

User
user_id: 1 (PK)
name: varchar
email: varchar
dob: datetime
created_at: datetime
Group
group_id: (PK)
name: varchar
description: varchar
created_at: datetime
UserFollow
user_id: int
follow_group_id: int
follow_user_id: int
Post
post_id: int (PK)
content: text
user_id: int
group_id: int
created_at: datetime
-- All entities user follows
SELECT follow_user_id, follow_group_id 
FROM UserFollow 
WHERE user_id = :user_id
-- Posts for News Feed
SELECT Post.*
FROM UserFollow 
LEFT JOIN Post on (
  Post.user_id = follow_user_id OR
  Post.group_id = follow_group_id
)
WHERE user_id = :user_id
ORDER BY created_at DESC

500 friends (AVG)

+10M Post Per Day

News Feed Query

Facebook

UserFollow
user_id: int
follow_group_id: int
follow_user_id: int
Post
post_id: int (PK)
content: text
user_id: int
group_id: int
​created_at: datetime
-- Posts for News Feed
(
  SELECT Post.*
  FROM UserFollow 
  LEFT JOIN Post on Post.user_id = follow_user_id
  WHERE user_id = :user_id
)
UNION
(
  SELECT Post.*
  FROM UserFollow 
  LEFT JOIN Post on Post.group_id = follow_group_id
  WHERE user_id = :user_id
)
ORDER BY created_at DESC

Requirements

Facebook

  • Design FB News Feed
  • No photo/video in posts
  • No post ranking. Chronological order
  • Ads - nice to have (soft requirement)
  • Multi-region (hard requirement)
  • Latency (within region): < 1s
  • Latency (multi-region): < 60s
  • Durability is very important
  • Availability less important
  • 1B daily active users
  • 10M posts per day
  • 500 average friends
  • 5 news feed views per user per day

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Database

Posts Service

RDBMS

  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Performance Mantras

  1. Don't do it
  2. Do it, but don't do it again
  3. Do it less
  4. Do it later
  5. Do it when they're not looking
  6. Do it concurrently
  7. Do it cheaper

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Database

Posts Service

RDBMS

  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Posts Service

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Does this strategy make sense before continue trying?

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Posts Service

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Message

Queue

Feed Generation

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Posts Service

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Message

Queue

Feed Generation

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Posts Service

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Message

Queue

Feed Generation

Requirements

Facebook

  • Design FB News Feed
  • No photo/video in posts
  • No post ranking. Chronological order
  • Ads - nice to have (soft requirement)
  • Multi-region (hard requirement)
  • Latency (within region): < 1s
  • Latency (multi-region): < 60s
  • Durability is very important
  • Availability less important
  • 1B daily active users
  • 10M posts per day
  • 500 average friends
  • 5 news feed views per user per day

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Posts Service

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Message

Queue

Feed Generation

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Posts Service

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Stream or ETL?

Feed Generation

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Posts Service

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Streaming

Feed Generation

News Feed

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Posts Service

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Moderation

Feed Generation

MQ

Streaming

News Feed Schema

Facebook

User
user_id: 1 (PK)
name: varchar
email: varchar
dob: datetime
created_at: datetime
Group
group_id: (PK)
name: varchar
description: varchar
created_at: datetime
UserFollow
user_id: int
follow_group_id: int
follow_user_id: int
Post
post_id: int (PK)
content: text
user_id: int
group_id: int
created_at: datetime
Feed
feed_item_id: int (PK)
post_id: int (PK)
subject: varchar
content: text
user_id: int
posted_user_id: int
posted_group_id: int
created_at: datetime

News Feed Query

Facebook

-- Posts for News Feed
SELECT *
FROM Feed 
WHERE user_id = :user_id
ORDER BY created_at DESC
Feed
feed_item_id: int (PK)
post_id: int (PK)
content: text
user_id: int
posted_user_id: int
posted_group_id: int
created_at: datetime

Does make sense for this part of a system?

News Feed Ads

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Database

Posts Service

RDBMS

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?
  • Round-Robin?
  • Sticky?

Streaming

Feed Generation

Ads RTB

Scalability and Bottlenecks 

1K

10K

100K

1M

1B

Single Server

Load Balancer

Multiple Servers

Read Replicas

CDN

Cache

Rate-Limits

Stateless Services

DB Sharding

NoSQL

Regional DCs

Messaging

Users Population

News Feed Scaling

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

RDBMS

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?

Round-Robin

Streaming

Feed Generation

Ads RTB

avg: ~100 rps

peak: ~1k rps

Posts Service

Posts Service

Posts Service

Posts Service

+3TB Monthly

1 Server = 10 TB

RDBMS

RDBMS

RDBMS

RDBMS

db shards

Posts Shard Key

post_id ?

user_id ?

Post
post_id: int (PK)
content: text
user_id: int
group_id: int
created_at: datetime

RDBMS

RDBMS

RDBMS

RDBMS

db shards

RDBMS

RDBMS

RDBMS

RDBMS

db shards

created_at ?

RDBMS

RDBMS

RDBMS

RDBMS

db shards

  • Uniform distribution
  • Bad locality
  • Non-uniform distribution
  • Better data locality
  • Worst distribution
  • Best data locality

News Feed Scaling

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?

Round-Robin

Streaming

Feed Generation

Ads RTB

avg: ~100 rps

peak: ~1k rps

Posts Service

Posts Service

Posts Service

RDBMS

RDBMS

RDBMS

RDBMS

db shards

Debezium

Kafka

streaming

Feed Generation

Oleksii Petrov

Hello

World

Message Queue

Feed Generation

Follower

Follower

Follower

Follower

Hello

World

Hello

World

Hello World

Hello

World

avg: 500 followers

Feed Generation

Feed Generation

Oleksii

Petrov

Hello

World

Message Queue

Feed Generation

(after the conference)

What's the latency ?

News Feed Scaling

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?

Round-Robin

Feed Generation

Ads RTB

avg: ~100 rps

peak: ~1k rps

avg: ~58k rps

peak: ~580k rps

Posts Service

Posts Service

Posts Service

RDBMS

RDBMS

RDBMS

RDBMS

db shards

Debezium

Kafka

streaming

Performance Mantras

  1. Don't do it
  2. Do it, but don't do it again
  3. Do it less
  4. Do it later
  5. Do it when they're not looking
  6. Do it concurrently
  7. Do it cheaper

News Feed Scaling

Facebook

User

Load

Balancer

Feed Service

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Database

  • RDBMS?
  • InMemory?
  • NoSQL?
  • Round-Robin?
  • Sticky?

Round-Robin

Feed Generation

Ads RTB

avg: ~100 rps

peak: ~1k rps

avg: ~58k rps

peak: ~580k rps

Posts Service

Posts Service

Posts Service

RDBMS

RDBMS

RDBMS

RDBMS

db shards

Debezium

Kafka

streaming

Feed DB Estimate

  • 1B users
  • 10kb per one post
  • 500 posts in the DB (assumed)
  • Estimated Storage: 5MB * 1B = 5PB
  • Servers: 5PB / 10TB = 500

News Feed Scaling

Facebook

User

Load

Balancer

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Round-Robin

Feed Generation

Ads RTB

avg: ~100 rps

peak: ~1k rps

avg: ~58k rps

peak: ~580k rps

Posts Service

Posts Service

Posts Service

RDBMS

RDBMS

RDBMS

RDBMS

db shards

Debezium

Kafka

streaming

NoSQL

NoSQL

NoSQL

NoSQL

db shards

  • Round-Robin?
  • Sticky?

Feed Service

Feed Shard Key

feed_item_id ?

user_id ?

NoSQL

NoSQL

NoSQL

NoSQL

db shards

NoSQL

 NoSQL 

NoSQL

NoSQL

db shards

created_at ?

NoSQL

NoSQL

NoSQL

NoSQL

db shards

  • Uniform distribution
  • Bad locality
  • Non-uniform distribution
  • Better data locality
  • Worst distribution
  • Best data locality
Feed
feed_item_id: int (PK)
post_id: int (PK)
content: text
user_id: int
posted_user_id: int
posted_group_id: int
created_at: datetime

Hint: Consistent Hashing and virtual nodes

News Feed Scaling

Facebook

User

Load

Balancer

User

getNewsFeed()

API

User

Load

Balancer

createPost()

API

Round-Robin

Feed Generation

Ads RTB

avg: ~100 rps

peak: ~1k rps

avg: ~58k rps

peak: ~580k rps

Posts Service

Posts Service

Posts Service

RDBMS