Design a system for TinyURL

from

Grokking the System Design Interview

Why should we learn SD?

Question in Interview

  • Design a Twitter
  • Design a Youtube
  • Design a Instagram
  • Design a Netflix

Performance in these interviews reflects upon your ability to work with complex systems and translates into the position and salary the interviewing company offers you.

- Educative.io

Who should learn SD?

  • Frontend Engineer
  • Backend Engineer
  • Data Engineer
  • Site Reliability Engineer
  • Software Engineer, etc.

Common topics for SD:

  • QPS
  • Load Balancer
  • Web Server
  • Database Server
  • Schema Design
  • SQL vs NoSQL
  • Caching
  • DB Replica
  • Scalability, etc.

What is TinyURL?

Today's outline

  • Requirements and Goals
  • Capacity Estimation and Constraints
  • System APIs
  • Database Design
  • Basic System Design and Algorithm
  • Data Partitioning and Replication
  • Cache
  • Load Balancer
  • Purging or DB cleanup

Requirements and Goals

  • Functional Requirements
  • Non-Functional Requirements

Functional Requirements

  • Given URL, generate a shorter and unique alias of it.
  • When user access it, we should redirect it.
  • Users should be able to pick a custom short link.
  • Links will expire after a standard default timespan.

Non-Functional Requirements

  • This system should be highly available.
  • URL redirection should happen in real-time with minimal latency.
  • Shortened links should not be predictable.

Capacity Estimation and Constraints

  • Traffic estimates
  • Storage estimates
  • Bandwidth estimates
  • Memory estimates

Assumption:

  1. 500M new URL shortenings/month
  2. 100:1 R/W ratio
  3. Store every URL for 5 years

Traffic estimates:

Expected redirections:

100 * 500M => 50B

 

New URLs shortenings QPS:

500M / (30d * 24h * 3600s) =~200 URLs/s

 

URLs redirections QPS: (R:W=100:1)

100 * 200 URLs/s = 20K/s

Storage estimates:

total number of objects:

500M * 5y * 12m = 30B

 

Assume each object will be 500bytes, we will need:

30B * 500 bytes = 15 TB

Bandwidth estimates:

Write:

200 * 500 bytes = 100 KB/s

 

Read:

20K * 500 bytes = ~10 MB/s

Memory estimates:

Follow 80-20 rule, meaning 20% of URLs generate 80% of traffic.

 

Requests per day:

20K * 3600s * 24h = ~1.7B

 

Cache Size:

0.2 * 1.7B * 500 bytes = ~170GB

System APIs

Once we've finalized the requirements, it's always a good idea to define the system APIs.

Choose way to expose API:

  1. SOAP
  2. REST APIs
createURL(api_dev_key, original_url, custom_alias=None, user_name=None, 
          expire_data=None)
deleteURL(api_dev_key, url_key)

Returns: (string)

A successful insertion returns the shortened URL; otherwise, it returns an error code.

Returns: (‘URL Removed’)

“url_key”: string representing the shortened URL.

Database Design

Defining the DB schema in the early stages of the SDI would help to understand the data flow among various components.

Observations:

  1. We need to store billions of records.
  2. Each object we store is small.
  3. There are no relationships between records.
  4. Our service is read-heavy.

DB Schema:

What kind of DB should we use?

  • Storing billions of rows
  • No relationships
  • Better scalability

NoSQL:

DynamoDB,

Cassandra,

Riak

Basic System Design and Algorithm

URL shortening logic

  1. Base 62: Take int/long and gives out base62 output which contains int or char(small/caps).
  2. MD5: Take unique string and gives base62 output.

 

Why use base 62 not base 10? (we take 7 chars)

Base 10 => 10^7 i.e 10 Million combinations

Base 62 => 62^7 i.e 3.5 Trillion.

Data Partitioning and Replication

  • Range Based Partitioning
  • Hash-Based Partitioning

Range Based Partitioning

Original URL => "A"1XXXX

Original URL => "A"3XXXX

Original URL => "A"4XXXX

Original URL => "B"1XXXX

Hash Based Partitioning

"A"1XXXX => 1

"A"3XXXX => 1

"A"4XXXX => 2

"B"1XXXX => 2

Cache

  1. How much memory should we have?
    • Estimate with 20-80 rule.
  2. Which cache eviction policy would best fit our needs?
    • Use LRU to replace a link with newer URL.

Load Balancer

  1. Between Clients and Application servers
  2. Between Application Servers and database servers
  3. Between Application Servers and Cache servers

Purging or DB cleanup

  1. Should record stick around forever or be purged?
  2. If a user-specified expiration time is reached, what should happen to the link?

Lazy cleanup

  • Avoid putting a lot of pressure on database
  • Just ensure only expired links will be deleted
  • Some expired links can live longer but never be returned to users

Client request for expired link

 

delete and return error code

Final design

Reference

  • https://tinyurl.com/app
  • https://towardsdatascience.com/system-design-of-url-shortening-service-b325b18c8f88
  • https://betterprogramming.pub/how-would-you-design-tinyurl-and-instagram-987dfc06cbe9
  • https://www.ptt.cc/bbs/Soft_Job/M.1605589986.A.CBA.html
Made with Slides.com