Design Lesson 2
Copyright © 直通硅谷
http://www.zhitongguigu.com
Design pieces
- Accuracy
- Performance
- Robustness
- Scalability
- Test
Copyright © 直通硅谷
http://www.zhitongguigu.com
Knowledge pieces
- Database
- Network,Server,TCP/IP
- Security, Encryption/Decryption
- Distributed System
- Backup System
- *Hadoop
Copyright © 直通硅谷
http://www.zhitongguigu.com
Copyright © 直通硅谷
http://www.zhitongguigu.com
BBS System
BBS System
- User could sign up by username, email
- User could edit its own profile
- User could post
- User could reply to the post
- User could like some post
- User could go to someone's profile to see all the posts
Copyright © 直通硅谷
http://www.zhitongguigu.com
First Question:
Copyright © 直通硅谷
http://www.zhitongguigu.com
What is the key point for this question?
Database Design
BBS System
Copyright © 直通硅谷
http://www.zhitongguigu.com
What do we need?
First: Extract basic elements
User, Post, Profile, Reply
Copyright © 直通硅谷
http://www.zhitongguigu.com
Most Important question
Copyright © 直通硅谷
http://www.zhitongguigu.com
Design endpoint to fulfill these requirements
Backend: How to join the table to give the results
endpoint to fulfill the requirements
Copyright © 直通硅谷
http://www.zhitongguigu.com
- Sign up
- Sign in
- Profile Edit
- Post
- Reply
- Like
- Unlike
- Edit Post/Reply
How to send the request
Copyright © 直通硅谷
http://www.zhitongguigu.com
Apache
Nginx
Lighttpd
IIS
Tomcat
Jetty
Netty
How to join the table to give the results
Copyright © 直通硅谷
http://www.zhitongguigu.com
- Sign up
- Sign in
- Profile Edit
- Post
- Reply
- Like
- Unlike
- Edit Post/Reply
Basic Operation of SQL
Copyright © 直通硅谷
http://www.zhitongguigu.com
- Select
- From
- Where
- Join
- Left Join
- Right Join
- Outer Join
- Delete
- Create
- Insert Into
- Update
- Distinct
- Count
- Min
- Max
- Avg
Follow up question
- If the user can see who likes the post?
- If some one could reply other's reply? (like wechat discovery)
Copyright © 直通硅谷
http://www.zhitongguigu.com
Copyright © 直通硅谷
http://www.zhitongguigu.com
Scalability (System Design)
- How to improve the system if there are thousands of replies?
- How to improve the system if there are thousands of users?
Copyright © 直通硅谷
http://www.zhitongguigu.com
System Design
Database Sharding
- Vertical partitioning. Normalization is one way of vertical partition. We split columns into different tables. Or even further, we have frequent used table of data stored on a faster machine.
- Horizontal partitioning. (Shard). Instead of vertical split the columns, we split the rows. The tables are divided and distributed into different server.
Copyright © 直通硅谷
http://www.zhitongguigu.com
System Design
Scaling up
- Vertical Scaling
- CPU, Disk(PATA, SATA, SSD), RAM
- Horizontal Scaling
- Load Balancer
- Database Replication
Copyright © 直通硅谷
http://www.zhitongguigu.com
System Design
Load Balancer
- load balancer has public IP address for DNS to return
- backend worker servers has private IP address for safety
- How to divide work?
- determine which is busy
- divide by work category
- round robin(different ip address at different times)
- login again and again
- Write session to hard disk
- Sticky Session
- login again and again
Copyright © 直通硅谷
http://www.zhitongguigu.com
System Design
Server & Database Replication
- Master-Slave
- Good for read-heavy websites.
- Free backup
- Slave become master if master is down
- Master-Master
- Prevent master down
Copyright © 直通硅谷
http://www.zhitongguigu.com
System Design
Caching
Fetching from DB is slow
- Caching queries
- Most commonly used
- Query (hash) -> result
- If one cell is updated, delete all cached queries
- Caching objects (Redis)
- See data as an object
- Build dataset from DB and store the whole. (page)
- Activity steams
- A smaller but faster DB in mem.
Copyright © 直通硅谷
http://www.zhitongguigu.com
System Design
CDN
Content Distributed Network
- CDNs replicate content to multiple places.
- Closer, less hops
- Amazon CloudFront
Copyright © 直通硅谷
http://www.zhitongguigu.com
How to improve the system if there are thousands of replies/users?
Copyright © 直通硅谷
http://www.zhitongguigu.com
- Pagination: fetch the data as pages
- Async: preload the most important data
- Sharding: horizontal partition of data in a database or search engine
Short URL
Copyright © 直通硅谷
http://www.zhitongguigu.com
Short URL
Copyright © 直通硅谷
http://www.zhitongguigu.com
In Twitter, Weibo, you have charactor limit. Also normal URL will contain some invalid ASCII code that you cannot put into the content.
So a lot of company has its own Short URL
Example: http://t.cn/RGKGR3T
Short URL
Copyright © 直通硅谷
http://www.zhitongguigu.com
Design A System that can generate this Short URL
- The user input an URL, the server returns a short one
- Other user click this short URL, it will redirect to the real URL
- If the URL has not been used (no one clicks for 72 hours, configurable), it will be removed
How to generate a short URL
Copyright © 直通硅谷
http://www.zhitongguigu.com
UUID
MD5
Are these too long? Could we find a better way?
Think about the requirement
one URL could be used released after some day
How to generate a short URL
Copyright © 直通硅谷
http://www.zhitongguigu.com
We do not generate a short URL everytime, we just lookup the table
How to send request
Copyright © 直通硅谷
http://www.zhitongguigu.com
Apache Tomcat
Design a Database
Copyright © 直通硅谷
http://www.zhitongguigu.com
Short URL -> Long URL, time
Pretty Easy? If we have a high QPS?
What DB we should use?
Copyright © 直通硅谷
http://www.zhitongguigu.com
Relational Database: Oracle, MySql
Non-Relational Database (NO Sql): MangoDB, CouchDB, Couchbase, Redis
RDBMS vs NoSQL
Copyright © 直通硅谷
http://www.zhitongguigu.com
RDBMS(Relational Database Management System)
- Strong mathematical basis
- Declarative syntax
- A well-known language in Structured Query Language (SQL)
- Real-time Read
- Read/Write Consistency
RDBMS vs NoSQL
Copyright © 直通硅谷
http://www.zhitongguigu.com
NoSQL (Not Only SQL)
- It does not have schema
- Read/Write throughput is very high
- Horizontal scalability can be achieved easily
- Will support Bigdata in volumes of Terra Bytes & Peta Bytes
- Faster development life cycles for developers
- Can be hosted in cheaper hardware machines
- In memory caching option is available to increase the performance of queries
Lots of companies combine these two together
Copyright © 直通硅谷
http://www.zhitongguigu.com
For this question
Copyright © 直通硅谷
http://www.zhitongguigu.com
WE could just use NoSQL
Reason: we do not have complicated query to fetch data, we only need original URL and short URL (timestamp maybe)
How to remove unused links?
Copyright © 直通硅谷
http://www.zhitongguigu.com
How to update the Store?
How to remove them?
How to remove unused links?
Copyright © 直通硅谷
http://www.zhitongguigu.com
short -> Long
short -> time
time -> short
Is it okay?
How to remove unused links?
Copyright © 直通硅谷
http://www.zhitongguigu.com
If we could stretch the time
short -> Long
short -> time
Daily data purge job
If the QPS is very high
Everything is different
Copyright © 直通硅谷
http://www.zhitongguigu.com
Do we really need to be accurate to seconds to purge the data?
Since we have a daily job, it seems not necassary
Messaging System
Apache Kafka
RDBMS + NOSQL
Copyright © 直通硅谷
http://www.zhitongguigu.com
Relational Database cannot handle high reading QPS
NOSQL has a lot limit in relation and maintenance
So sometimes we just hybrid them together
RDBMS + NOSQL
Copyright © 直通硅谷
http://www.zhitongguigu.com
For common get queries, we store them in NOSQL
For specific queries and write operation, we still use RDBMS
How to update NOSQL? Event Listener
Summary
Copyright © 直通硅谷
http://www.zhitongguigu.com
Database: Relational and NoSql -> consider to use which one
Network: Apache, Tomcat, etc -> Rest API, JSON
Distributed System: Load balancing, Sticky Session
Security: HTTPS, Encryption/Decryption
[GoValley-201612] Design Lesson 2
By govalley201612
[GoValley-201612] Design Lesson 2
- 1,094