SCALING 2000 Writes/SEC With Mongo DB
SUCHIT PURI
App Dev with ThoughtWorks
@suchitpuri
LEADING LIFESTYLE CHANNEL IN THE UK
Service from channel4.com which lets you collect and keep , bookmarks , recipes from your favorite shows.
When a Show Goes Live The Load increases drastically
And The Response Time PEAKs
Challenge
Build the system from scratch to have Sub 3 Second response times , under peak load of 2000 Writes/Sec.
The Choice We Made
- Good Support from 10Gen
- Is a document store , and is easy to understand and relate to
- Out of the box horizontal scaling
- Good monitoring solutions available from 10gen ( MMS)
- Was boasting of more than 10000 writes/sec
http://www.severalnines.com/blog/nosql-battle-east-coast-benchmarking-mongodb-vs-tokumx-cluster
Initial ARCHITECTURE
first PerFormance Run
Ran with 9000 Virtual Users got 503 from 7350 Users.
We were Able to scale to only 75Req/Sec
What Went Wrong
-
Front End Servers - Reaching Max CPU Utilization - Because of the image capture . ( Initially only 3 front end servers )
- Backend ( 1 Shard Of Mongo ) - Mostly waiting for I/O from the Mongo Server
LEarnings - MongoDB Global Lock Really Hurts
-
Mongo 2.0 - Global Lock - For the whole mongod process.
-
Mongo 2.2+ - Database Level Locking.
-
To Effective Increase your writes you need to add in more shards.
-
The Application is running under w2 ("replica safe" write mode).
Ohh ....DID We FORGeT to configure AUTH for our DB SERVERS ?
2ND PERFORMANCE RUN - WITH AUTHENTICATION TURNED ON
exception in thread "pool-1-thread-25" com.mongodb.DBPortPool$SemaphoresOut: Out of semaphores to get db connection
RAISED [MongoDB-JIRA] (CS-3016) Authentication Causes Decreased Performance in Channel's Java Application
3rd Performance Test
New Feature Added - Get the count of all the bookmarks with specific tag for a unique user
What Went Wrong
Bookmark.count() query taking too much (~ 55 sec ) time to return the response.
Learning
-
Did We Miss the index ? No
- Mongo DB used Non-Counting B-Trees as their index data structure.
- We Started maintaining count locally in every collection
Final Production ARCHITECTURE
10,000+ Simultaneous Virtual Users
2000+ Writes/Sec
4th Performance Test- Is Mongo able to scale horizontally under load
Learnings
Adding or Removing a shard under load triggers balancer and splitter background processes which tries to rebalance the machine.
This puts in a lot of load on the existing machines as it increases the network traffic and locks the machines for writes.
Thank You
@suchitpuri
Questions?