Scalability

Skeepers Influence Workshop

Your HOsts Tonight

Alex Fernández (pinchito)

Director of Eng, Influence

Platform Lead, Influence

Alfredo López

What We Will See

What is Scalability?

Horizontal and Vertical Scaling

Scaling Strategies

🚀 What is scalability

Source

🪜 Definition of scalability

The capacity to be changed in size or scale.

The ability of a computing process to be used or produced in a range of capabilities.

Lexico (Oxford Dictionary)

⛷️ Scale Up and Down!

📚 Use in Literature

🐧 Example: Linux

From embedded

to supercomputers

✋ What makes something Scalable?

The right question is: why doesn't it scale?

Scarcity of a resource
Growing wait times
Unability to answer
Blocking on a resource
...

📝 Exercise: non Scalable Service

Install Node.js

Install loadtest

$ npm install -g loadtest

Run the command

$ loadtest https://gorest.co.in/public/v1/users -n 2000 -c 100 --keepalive

Write down the average for requests per second (rps),

average latency and number of errors

⮯

📝 Exercise +

Adjust rps from 10 to 100 ±10

$ loadtest http://service.pinchito.es:3000/a -n 2000 -c 100 --rps 30 -k

$ loadtest http://service.pinchito.es:3000/a -n 2000 -c 100 --rps 40 -k

...

Write down rps, latency and errors

Go up to 100, then 1000

$ loadtest http://service.pinchito.es:3000/a -n 2000 -c 100 --rps 1000 -k

⮯

📝 Exercise +

Draw a graph with rps sent and result

Another graph with rps vs latency

⮯

📝 Exercise +

Now test against:

loadtest https://www.google.com -n 2000 -c 100 -k

What is the difference?

How do latency and rps behave now?

⮯

👍 Success!

🥛 What Resource Run out?

Graph with CPU from AWS:

300 rps

400 rps

1000 rps

🚂 rps vs throughput

Source

Source

📈 Scalability Profiles

Brendan Gregg: Systems Performance

🚒 Latency vs rps

Source

⚖️ Little's Law

Little, 1952 - 1960

The average number of requests in flight L equals:

the rate of requests per second λ

multiplied by the average request time W.

L=\lambda W.

Wikipedia

If we increase concurrency L,

the average time per request W grows proportionally.

⇕v and ⇔ h Scaling

🧓 Hard Beginnings

IBM mainframe

💽 Specialized Servers

🗄️ The Usual Cabins

🤖 And then Google Arrived

Google storage cabin, 1996

⇕ vertical Scaling

Buy a bigger machine

And bigger

Until you run out of machines

Hard to go back to a smaller machine 😅

⇕ vertical Scaling

🤫 Sshhh...

For many decades now, supercomputers are just...

clusters of smaller machines

IBM Blue Gene/P: 164k cores, 2007

⇔ Horizontal Scaling

Use many similar machines for a given function

("provisioning")

Add or remove machines to scale

When one machine is failing it is removed from service

⇔ horizontal Scaling

📝 Exercise: Storage

Design a corporate storage system with 15 TB

Option 1 ⇕: storage area network (SAN)

Best option as of december 2008

Option 2 ⇔: raw hard drives

Best option as of july 2009

⮯

📝 EXERCISE +

Add controllers

RAID options (Redundant Array of Inexpensive Disks)

Measure the $ difference between option 1⇕ and option 2⇔

Final price?

⮯

📝 EXERCISE +

Consider redundancy strategies

Fault tolerance

Redundancy options: 2x, 3x, ?

Consider scaling strategies

How do they affect the price?

⮯

👍 success!

⇔ Horizontal Strategies

🤹 Balancing (server-side)

🕵️ Balancing (client-side)

💝 Affinity

🔱 Independence

🍇 Clustering

🔑 Sharding

🧬 Replication

⌛ Queues

🤹 Server-side Balancing

Example: AWS ELB, Google Cloud Load Balancing

🕵️ Client-Side Balancing

Example: Facebook client

Chooses the API endpoint in the browser

Example: DNS balancing

💝 Affinity ⇔

By cookie or geographical

Needs a sophisticated router

Client-side or server-side

🔱 Independence ⇔

Neutral (or blind) balancing

🍇 Clustering ⇔

Generic term "cluster":

create one machine out of many

In databases usually means having more than one server

all equivalent

🔑 Sharding ⇔

Balancing by key

Needs a sharding algorithm (usually with hashing)

🧬 Replication ⇔

A primary server (read + write) and several replicas (read-only)

Useful when reading > writing

🐫 Active REPLICAtion ⇔

Active-active, multiple primary...

Needs a conciliation algorithm

⌛ Queues ⇔

Production of tasks independent of consumption

Mechanism for polling

(NOT pooling 🙏)

📝 Exercise: Scalable Storage

You work for search engine Fooble in January 2000

You have to store the search index

Design a scaling strategy

Assume index = page sizes

Use contemporary disk drives

⮯

📝 EXERCISE +

10 KB per page

50 million pages

4 million searches per day

10 search terms max

Target time of 0.1 seconds per search

⮯

📝 EXERCISE +

50M pages × 10 KB = 500 GB

Cheapest disk drive: Seagate ST317242A, 17.2 GB, $152

32 disks × 16 GB = 512 GB, $4864

8 servers × 4 disks = 32 disks

4M searches × 100 ms = 400k seconds = 4.6 servers

Adding peak time: at least 8 servers

⮯

📝 EXERCISE +

100 ms for ~5 search terms

Average query time to storage < 20 ms

⮯

📝 EXERCISE +

Query time: seek time + 1/2 turn + formatting

Seek time: ~8 ms

7200 rpm disk drive: 8 ms per turn

Query total: >12 ms

Seems doable; better add some caching

⮯

👍 Well done!

📚 Bibliography

Brendan Gregg: Systems Performance: Enterprise and the Cloud

John Allspaw: Web Operations: Keeping the Data On Time

HighScalability.com: Favorite posts on HighScalability