Intuition doesn't work at Scale

 

Ankeet Maini

@ankeetmaini

Zootopia!

Traffic

Usual days

x
xx

Traffic

Usual days

x
xx
100x
100x100x

BBD

Problem

Scaling the Modern Web App to 100x!

But first, let's see the modern web stack!

Top level Load balancer (Nginx)

Front End Load balancer (Nginx)

PM2

Node Instances

flipkart.com

API boxes, mSite etc.

n instances

PM2

Node Instances

..........................

Traffic

Current
10x
10x10x
100x
100x100x

BBD

All good you think? But we're still stuck at 10x!

Bottleneck:

Network Bandwidth

Top level Load balancer (Nginx)

Front End Load balancer (Nginx)

Node

Content

1MB

Let's understand this with a scenario

Your target: 10,000 qps

Response payload: 1MB

Total bandwidth required: 10,000 MB/s

 

1GB = 1000MB*

So do the math :P

 

For 10,000 qps we need

10GBps

1 GBps line

Node

Content

1MB

...... n instances

Assume,

Remember!

  • Small initial payload
  • Be extremely stingy, network wise
  • Anything extra degrades the performance
  • And we all have seen this #hashtag so often #perfmatters :P

Meanwhile Road Widening happening in Zootopia...

Uh! Wait!

  • What if I can't buy more bandwidth?

  • And, I can't cut down on the content because you know, uh Product Managers :P

TA DA! Use Compression!

GZIP it!

But, before I tell you the-big-deal!

I like to create suspense and build-up :P

How many of you have written this?

var compression = require('compression');
var express = require('express');

var app = express();

app.use(compression());

Raise your hands please!

And just exactly how many times we've read or heard...

Node

is

SINGLE THREADED

Always

  • Let node take care of just the application specific tasks, always! (No GZIPing at Node, never!)
  • Let a reverse proxy do the Gzip compression!
  • Read "Performance Best Practices" on Express's official site (if you do use Express like us!).

Updated Architecture

Gzip at reverse proxy (Nginx)

Top level Load balancer (Nginx)

Front End Load balancer (Nginx)

PM2 managed Node instances

PM2 managed Node instances

.................. n instances

Now, also GZIPing for us

Pumped up with our latest finding we ran the load tests again...

Thinking nothing can put us down now...

Traffic

Current
28x
28x28x
100x
100x100x

BBD

And...we're able to go till 28x

Why?

Top level Load balancer (Nginx)

Front End Load balancer (Nginx)

PM2 managed Node instances

PM2 managed Node instances

.................. n instances

GZIPing

BOTTLENECK!

Co-hosted Node and Nginx

Top level Load balancer (Nginx)

Front End Load balancer (Nginx)

PM2 managed Node instances

PM2 managed Node instances

.................. n instances

GZIPing

Nginx

Nginx

Co-hosted Nginx and Node on the same box

We ran the load tests again...

Traffic

Current
61x
61x61x
100x
100x100x

BBD

We could scale to 61x!

61x is super cool, but remember our goal?

100x

With better roads, comes bigger cars!

Server Side Rendering

partially

What actually happens in Server Side Rendering?

Node
from Node, call APIs to get the data
generate HTML
req
Node
from Node, call APIs to get the data
generate HTML
req

1

2

3

Suspicious candidate for bottleneck? 1, 2 or 3?

What happens when you make AJAX calls from Node?

http://api.server:80/

API Server

Node

http://my.server:8080/
http://p.q.r.s:80/

API Server

Node

http://a.b.c.d:8080/

Src IP

Src Port

Dest IP

Dest Port

a.b.c.d

p.q.r.s

80

a.b.c.d

p.q.r.s

80

a.b.c.d

p.q.r.s

80

a.b.c.d

p.q.r.s

80

Ephemeral Ports (32768 - 61000)
32769
32770
32771
32772

So it's possible that you might run out of Ephemeral Ports

And anything which is possible, is practical at Flipkart!

Solution Strategies

1. Connection Pooling

2. Increase Ephemeral Ports

# Linux

$ cat /proc/sys/net/ipv4/ip_local_port_range 
32768	61000

And we could finally scale to 94x!

Car Pooling to the rescue in Zootopia!

Thank you!

 

Ankeet Maini

@ankeetmaini

Intuition doesn't work at Scale

By Ankeet Maini

Intuition doesn't work at Scale

  • 227