Design Lesson 2

Copyright © 直通硅谷

http://www.zhitongguigu.com

Design pieces

  • Accuracy
  • Performance
  • Robustness
  • Scalability
  • Test

Copyright © 直通硅谷

http://www.zhitongguigu.com

Knowledge pieces

  • Database
  • Network,Server,TCP/IP
  • Security, Encryption/Decryption
  • Distributed System
  • Backup System
  • *Hadoop

Copyright © 直通硅谷

http://www.zhitongguigu.com

Copyright © 直通硅谷

http://www.zhitongguigu.com

BBS System

BBS System

  • User could sign up by username, email
  • User could edit its own profile
  • User could post
  • User could reply to the post
  • User could like some post
  • User could go to someone's profile to see all the posts

Copyright © 直通硅谷

http://www.zhitongguigu.com

First Question:

Copyright © 直通硅谷

http://www.zhitongguigu.com

What is the key point for this question?

Database Design

BBS System

Copyright © 直通硅谷

http://www.zhitongguigu.com

What do we need?

First: Extract basic elements

User, Post, Profile, Reply

Copyright © 直通硅谷

http://www.zhitongguigu.com

Most Important question

Copyright © 直通硅谷

http://www.zhitongguigu.com

Design endpoint to fulfill these requirements

Backend: How to join the table to give the results

endpoint to fulfill the requirements

Copyright © 直通硅谷

http://www.zhitongguigu.com

  • Sign up
  • Sign in
  • Profile Edit
  • Post
  • Reply
  • Like
  • Unlike
  • Edit Post/Reply

How to send the request

Copyright © 直通硅谷

http://www.zhitongguigu.com

Apache

Nginx

Lighttpd

IIS

Tomcat

Jetty

Netty

How to join the table to give the results

Copyright © 直通硅谷

http://www.zhitongguigu.com

  • Sign up
  • Sign in
  • Profile Edit
  • Post
  • Reply
  • Like
  • Unlike
  • Edit Post/Reply

Basic Operation of SQL

Copyright © 直通硅谷

http://www.zhitongguigu.com

  • Select
  • From
  • Where
  • Join
    • Left Join
    • Right Join
    • Outer Join
  • Delete
  • Create
  • Insert Into
  • Update
  • Distinct
  • Count
  • Min
  • Max
  • Avg

Follow up question

  • If the user can see who likes the post?
  • If some one could reply other's reply? (like wechat discovery)

Copyright © 直通硅谷

http://www.zhitongguigu.com

Copyright © 直通硅谷

http://www.zhitongguigu.com

Scalability (System Design)

  • How to improve the system if there are thousands of replies?
  • How to improve the system if there are thousands of users?

Copyright © 直通硅谷

http://www.zhitongguigu.com

System Design

Database Sharding

  • Vertical partitioning. Normalization is one way of vertical partition. We split columns into different tables. Or even further, we have frequent used table of data stored on a faster machine.
  • Horizontal partitioning. (Shard). Instead of vertical split the columns, we split the rows. The tables are divided and distributed into different server.

Copyright © 直通硅谷

http://www.zhitongguigu.com

System Design

Scaling up

  • Vertical Scaling
    • CPU, Disk(PATA, SATA, SSD), RAM
  • Horizontal Scaling
    • Load Balancer
    • Database Replication

Copyright © 直通硅谷

http://www.zhitongguigu.com

System Design

Load Balancer

  • load balancer has public IP address for DNS to return
  • backend worker servers has private IP address for safety
  • How to divide work?
    • determine which is busy
    • divide by work category
    • round robin(different ip address at different times)
      • login again and again
        • Write session to hard disk
        • Sticky Session

Copyright © 直通硅谷

http://www.zhitongguigu.com

System Design

Server & Database Replication

  • Master-Slave
    • Good for read-heavy websites.
    • Free backup
    • Slave become master if master is down
  • Master-Master
    • Prevent master down

Copyright © 直通硅谷

http://www.zhitongguigu.com

System Design

Caching

Fetching from DB is slow

  • Caching queries
    • Most commonly used
    • Query (hash) -> result
    • If one cell is updated, delete all cached queries
  • Caching objects (Redis)
    • See data as an object
    • Build dataset from DB and store the whole. (page)
    • Activity steams
    • A smaller but faster DB in mem.

Copyright © 直通硅谷

http://www.zhitongguigu.com

System Design

CDN

Content Distributed Network

  • CDNs replicate content to multiple places.
  • Closer, less hops
  • Amazon CloudFront

Copyright © 直通硅谷

http://www.zhitongguigu.com

How to improve the system if there are thousands of replies/users?

Copyright © 直通硅谷

http://www.zhitongguigu.com

  • Pagination: fetch the data as pages
  • Async: preload the most important data
  • Sharding: horizontal partition of data in a database or search engine

Short URL

Copyright © 直通硅谷

http://www.zhitongguigu.com

Short URL

Copyright © 直通硅谷

http://www.zhitongguigu.com

In Twitter, Weibo, you have charactor limit. Also normal URL will contain some invalid ASCII code that you cannot put into the content.

 

So a lot of company has its own Short URL

Example: http://t.cn/RGKGR3T

Short URL

Copyright © 直通硅谷

http://www.zhitongguigu.com

Design A System that can generate this Short URL

  • The user input an URL, the server returns a short one
  • Other user click this short URL, it will redirect to the real URL
  • If the URL has not been used (no one clicks for 72 hours, configurable), it will be removed

How to generate a short URL

Copyright © 直通硅谷

http://www.zhitongguigu.com

UUID

MD5

 

Are these too long? Could we find a better way?

Think about the requirement

one URL could be used released after some day

How to generate a short URL

Copyright © 直通硅谷

http://www.zhitongguigu.com

We do not generate a short URL everytime, we just lookup the table

How to send request

Copyright © 直通硅谷

http://www.zhitongguigu.com

Apache Tomcat

Design a Database

Copyright © 直通硅谷

http://www.zhitongguigu.com

Short URL -> Long URL, time

Pretty Easy? If we have a high QPS?

What DB we should use?

Copyright © 直通硅谷

http://www.zhitongguigu.com

Relational Database: Oracle, MySql

Non-Relational Database (NO Sql): MangoDB, CouchDB, Couchbase, Redis

RDBMS vs NoSQL

Copyright © 直通硅谷

http://www.zhitongguigu.com

RDBMS(Relational Database Management System)

  • Strong mathematical basis
  • Declarative syntax
  • A well-known language in Structured Query Language (SQL)
  • Real-time Read
  • Read/Write Consistency

RDBMS vs NoSQL

Copyright © 直通硅谷

http://www.zhitongguigu.com

NoSQL (Not Only SQL)

  • It does not have schema
  • Read/Write throughput is very high
  • Horizontal scalability can be achieved easily
  • Will support Bigdata in volumes of Terra Bytes & Peta Bytes
  • Faster development life cycles for developers
  • Can be hosted in cheaper hardware machines
  • In memory caching option is available to increase the performance of queries

Lots of companies combine these two together

Copyright © 直通硅谷

http://www.zhitongguigu.com

For this question

Copyright © 直通硅谷

http://www.zhitongguigu.com

WE could just use NoSQL

 

Reason: we do not have complicated query to fetch data, we only need original URL and short URL (timestamp maybe)

How to remove unused links?

Copyright © 直通硅谷

http://www.zhitongguigu.com

How to update the Store?

How to remove them?

How to remove unused links?

Copyright © 直通硅谷

http://www.zhitongguigu.com

short -> Long

short -> time

time -> short

 

Is it okay?

How to remove unused links?

Copyright © 直通硅谷

http://www.zhitongguigu.com

If we could stretch the time

short -> Long

short -> time

Daily data purge job

If the QPS is very high

Everything is different

Copyright © 直通硅谷

http://www.zhitongguigu.com

Do we really need to be accurate to seconds to purge the data?

Since we have a daily job, it seems not necassary

Messaging System

Apache Kafka

RDBMS + NOSQL

Copyright © 直通硅谷

http://www.zhitongguigu.com

Relational Database cannot handle high reading QPS

NOSQL has a lot limit in relation and maintenance

 

So sometimes we just hybrid them together

RDBMS + NOSQL

Copyright © 直通硅谷

http://www.zhitongguigu.com

For common get queries, we store them in NOSQL

For specific queries and write operation, we still use RDBMS

How to update NOSQL? Event Listener

Summary

Copyright © 直通硅谷

http://www.zhitongguigu.com

Database: Relational and NoSql -> consider to use which one

 

Network: Apache, Tomcat, etc -> Rest API, JSON

 

Distributed System: Load balancing, Sticky Session

 

Security: HTTPS, Encryption/Decryption

Made with Slides.com