Unique ID Generators in Distributed Systems
Backend 讀書會
11/01
15:00~16:30
@Universe Tech
Unique IDs
What's the requirements of Unique IDs?
Unique IDs
What's the considerations of Unique IDs?
Unique IDs
-
Considerations
- Length of ID
- Numeric?
- Sortable?
- Ability to differ from different machine?
- Collision Rate
- Amounts of per second
- Scalability
Unique IDs
-
Requirements in Case:
- IDs must be unique.
- IDs are numerical values only.
- IDs fit into 64-bit.
- IDs are ordered by date.
- Ability to generate over 10,000 unique IDs per second.
Unique IDs
-
Multi-Master Replication
Unique IDs
-
Multi-Master Replication
- auto_increment
-
Increase by k
-
Hard to scale with multiple data centers
-
IDs do not go up with time across multiple servers
-
It does not scale well when a server is added or removed
Unique IDs
-
UUID
- Universally Unique Identifier
- 128-bit long
Unique IDs
-
UUID v1
Unique IDs
-
UUID v4
Unique IDs
-
Ticket Server
Unique IDs
-
Ticket Server
-
Centralized auto_increment
- Numeric
- Single Point of Failure
- Poor Scalability
-
Unique IDs
-
Twitter's snowflake
-
64 bit long
- Numeric
- 2^41 can support 69 years of time
-
Datacenter IDs and machine IDs are chosen at the startup time
-
A machine can support a maximum of 4096 new IDs per ms
-
Unique IDs
-
Twitter's snowflake
-
Time clock synchronization is crucial
-
Cost on getting time from system is higher than random generation
-
Unique IDs
-
Alternatives
-
UIDGenerator by BaiDu
-
Leaf-Segment by 美团
-
Leaf-Snowflake by 美团
-
Seqsvr by Tencent
-
Design a URL Shorterner
-
Requirements
-
URL Shortening
-
URL Redirecting
-
High Availability
-
Scalability
-
Fault Tolerance
-
Design a URL Shorterner
-
Estimation
- Write operation: 100 million URLs are generated per day.
-
Write operation per second: 100 million / 24 /3600 = 1160
-
Read operation: Assuming ratio of read operation to write operation is 10:1, read operation per second: 1160 * 10 = 11,600
-
Assuming the URL shortener service will run for 10 years, this means we must support 100 million * 365 * 10 = 365 billion records
-
Assume average URL length is 100.
-
Storage requirement over 10 years: 365 billion * 100 bytes * 10 years = 365 TB
Design a URL Shorterner
-
API Endpoint
-
URL shortening. To create a new short URL, a client sends a POST request, which contains one parameter: the original long URL. The API looks like this:
-
POST api/v1/data/shorten
-
request parameter: {longUrl: longURLString}
-
return shortURL
-
-
-
URL redirecting. To redirect a short URL to the corresponding long URL, a client sends a GET request. The API looks like this:
-
GET api/v1/shortUrl
-
Return longURL for HTTP redirection
-
-
-
Design a URL Shorterner
-
URL Redirecting
Design a URL Shorterner
-
URL Shortening
- Each longURL must be hashed to one hashValue
-
Each hashValue can be mapped back to the longURL
Design a URL Shorterner
-
Design for URL Shortening
- Data Model: <shortURL, longURL> mapping in RDB
-
Hash function
-
hash + collision resolution
-
base 62 conversion
-
-
Hash value length
-
[0-9, a-z, A-Z], containing 10 + 26 + 26 = 62 possible characters
-
62^n ≥ 365 billion
-
When n = 7, 62 ^ n = ~3.5 trillion
-
Design a URL Shorterner
-
Hash + Collision Resolution
Design a URL Shorterner
-
Hash + Collision Resolution
- Recursively append a new predefined string
Design a URL Shorterner
-
Base 62 conversion
- commonly used for URL shorteners
-
https://tinyurl.com/2TX
Design a URL Shorterner
-
Comparison of 2 Approaches
Design a URL Shorterner
-
Chosen Flow
Design a URL Shorterner
-
Chosen Flow
- Assuming the input longURL is: https://en.wikipedia.org/wiki/Systems_design
-
Unique ID generator returns ID: 2009215674938
-
Convert the ID to shortURL using the base 62 conversion. ID (2009215674938) is converted to "zn9edcu"
Design a URL Shorterner
-
Design for URL Redirecting
- Can be stored in a cache to improve performance
Design a URL Shorterner
-
Discussions
- Databases: RDB or NoSQL?
- Scalability: Database and We
- How to check if a URL exists in DB efficiently?
- How to avoid malicious scanning?
- Short URL validation
- Rate Limiting
- How to cache effectively?
Discussion
Unique ID Generator in Distributed Systems - Backend 讀書會
By Albert Chen
Unique ID Generator in Distributed Systems - Backend 讀書會
- 76