Sam Wu
Data Engineer/Software Engineer/DL Researcher
from
Grokking the System Design Interview
Performance in these interviews reflects upon your ability to work with complex systems and translates into the position and salary the interviewing company offers you.
- Educative.io
Assumption:
Expected redirections:
100 * 500M => 50B
New URLs shortenings QPS:
500M / (30d * 24h * 3600s) =~200 URLs/s
URLs redirections QPS: (R:W=100:1)
100 * 200 URLs/s = 20K/s
total number of objects:
500M * 5y * 12m = 30B
Assume each object will be 500bytes, we will need:
30B * 500 bytes = 15 TB
Write:
200 * 500 bytes = 100 KB/s
Read:
20K * 500 bytes = ~10 MB/s
Follow 80-20 rule, meaning 20% of URLs generate 80% of traffic.
Requests per day:
20K * 3600s * 24h = ~1.7B
Cache Size:
0.2 * 1.7B * 500 bytes = ~170GB
Once we've finalized the requirements, it's always a good idea to define the system APIs.
Choose way to expose API:
createURL(api_dev_key, original_url, custom_alias=None, user_name=None,
expire_data=None)
deleteURL(api_dev_key, url_key)
Returns: (string)
A successful insertion returns the shortened URL; otherwise, it returns an error code.
Returns: (‘URL Removed’)
“url_key”: string representing the shortened URL.
Defining the DB schema in the early stages of the SDI would help to understand the data flow among various components.
Observations:
What kind of DB should we use?
NoSQL:
DynamoDB,
Cassandra,
Riak
Why use base 62 not base 10? (we take 7 chars)
Base 10 => 10^7 i.e 10 Million combinations
Base 62 => 62^7 i.e 3.5 Trillion.
Original URL => "A"1XXXX
Original URL => "A"3XXXX
Original URL => "A"4XXXX
Original URL => "B"1XXXX
"A"1XXXX => 1
"A"3XXXX => 1
"A"4XXXX => 2
"B"1XXXX => 2
Client request for expired link
delete and return error code
By Sam Wu