SCALING 2000 Writes/SEC With Mongo DB

SUCHIT PURI

App Dev with ThoughtWorks

@suchitpuri

LEADING LIFESTYLE CHANNEL IN THE UK

Service from channel4.com which lets you collect and keep , bookmarks , recipes from your favorite shows.

When a Show Goes Live The Load increases drastically

And The Response Time PEAKs

Challenge

Build the system from scratch to have Sub 3 Second response times , under peak load of 2000 Writes/Sec.

The Choice We Made

Good Support from 10Gen
Is a document store , and is easy to understand and relate to
Out of the box horizontal scaling
Good monitoring solutions available from 10gen ( MMS)
Was boasting of more than 10000 writes/sec

http://www.severalnines.com/blog/nosql-battle-east-coast-benchmarking-mongodb-vs-tokumx-cluster

Initial ARCHITECTURE

first PerFormance Run

Ran with 9000 Virtual Users got 503 from 7350 Users.

We were Able to scale to only 75Req/Sec

What Went Wrong

Front End Servers - Reaching Max CPU Utilization - Because of the image capture . ( Initially only 3 front end servers )

Backend ( 1 Shard Of Mongo ) - Mostly waiting for I/O from the Mongo Server

LEarnings - MongoDB Global Lock Really Hurts

Mongo 2.0 - Global Lock - For the whole mongod process.
Mongo 2.2+ - Database Level Locking.
To Effective Increase your writes you need to add in more shards.
The Application is running under w2 ("replica safe" write mode).

Ohh ....DID We FORGeT to configure AUTH for our DB SERVERS ?

2ND PERFORMANCE RUN - WITH AUTHENTICATION TURNED ON

exception in thread "pool-1-thread-25" com.mongodb.DBPortPool$SemaphoresOut: Out of semaphores to get db connection

RAISED [MongoDB-JIRA] (CS-3016) Authentication Causes Decreased Performance in Channel's Java Application

https://jira.mongodb.org/browse/SERVER-5418

3rd Performance Test

New Feature Added - Get the count of all the bookmarks with specific tag for a unique user

What Went Wrong

Bookmark.count() query taking too much (~ 55 sec ) time to return the response.

Learning

Did We Miss the index ? No
Mongo DB used Non-Counting B-Trees as their index data structure.

https://jira.mongodb.org/browse/SERVER-1752

We Started maintaining count locally in every collection

Final Production ARCHITECTURE

Final Throughput

10,000+ Simultaneous Virtual Users

2000+ Writes/Sec

4th Performance Test- Is Mongo able to scale horizontally under load

Learnings

Adding or Removing a shard under load triggers balancer and splitter background processes which tries to rebalance the machine.

This puts in a lot of load on the existing machines as it increases the network traffic and locks the machines for writes.

Thank You

@suchitpuri

SCALING 2000 Writes/SEC With Mongo DB

LEADING LIFESTYLE CHANNEL IN THE UK

When a Show Goes Live The Load increases drastically

And The Response Time PEAKs

Challenge

The Choice We Made

Initial ARCHITECTURE

first PerFormance Run

We were Able to scale to only 75Req/Sec

What Went Wrong

LEarnings - MongoDB Global Lock Really Hurts

Ohh ....DID We FORGeT to configure AUTH for our DB SERVERS ?

2ND PERFORMANCE RUN - WITH AUTHENTICATION TURNED ON

3rd Performance Test

What Went Wrong

Learning

Final Production ARCHITECTURE

Final Throughput

4th Performance Test- Is Mongo able to scale horizontally under load

Learnings

Thank You

Questions?