SCALING 2000 Writes/SEC With Mongo DB

SUCHIT PURI
App Dev with ThoughtWorks
@suchitpuri

LEADING LIFESTYLE CHANNEL IN THE UK

Service from channel4.com which lets you collect and keep , bookmarks , recipes  from your favorite shows.                                                                                              



When a Show Goes Live The Load increases drastically




And The Response Time PEAKs




Challenge

Build the system from scratch to have Sub 3 Second response times , under peak load of 2000 Writes/Sec.

The Choice We Made


  • Good Support from 10Gen
  • Is a document store , and is easy to understand and relate to 
  • Out of the box horizontal scaling
  • Good monitoring solutions available from 10gen ( MMS)
  • Was boasting of  more than 10000 writes/sec  



http://www.severalnines.com/blog/nosql-battle-east-coast-benchmarking-mongodb-vs-tokumx-cluster

Initial ARCHITECTURE


first PerFormance Run

Ran with 9000 Virtual Users got 503 from 7350 Users.


We were Able to scale to only 75Req/Sec


What Went Wrong


  • Front End Servers - Reaching Max CPU Utilization - Because of the image capture . ( Initially only 3 front end servers )


  • Backend ( 1 Shard Of Mongo ) -  Mostly waiting for I/O from the Mongo Server


LEarnings - MongoDB Global Lock Really Hurts


  • Mongo 2.0 - Global Lock - For the whole mongod process.
  • Mongo 2.2+ - Database Level Locking. 
  • To Effective Increase your writes you need to add in more shards.
  • The Application is running under w2 ("replica safe" write mode).




Ohh ....DID We FORGeT to configure AUTH for our DB SERVERS ?

2ND PERFORMANCE RUN - WITH AUTHENTICATION TURNED ON


exception in thread "pool-1-thread-25" com.mongodb.DBPortPool$SemaphoresOut: Out of semaphores to get db connection

RAISED [MongoDB-JIRA] (CS-3016) Authentication Causes Decreased Performance in Channel's Java Application






3rd Performance Test




New Feature Added - Get the count of all the bookmarks with specific tag for a  unique user





What Went Wrong




Bookmark.count() query taking too much (~ 55 sec ) time to return the response.

Learning

  • Did We Miss the index ? No
  • Mongo DB used Non-Counting B-Trees as their index data structure.
                                 
https://jira.mongodb.org/browse/SERVER-1752
  • We Started maintaining count locally in every collection

Final Production ARCHITECTURE


Final Throughput





10,000+ Simultaneous Virtual Users 
2000+ Writes/Sec



4th Performance Test- Is Mongo able to scale horizontally under load


Learnings


Adding or Removing a shard under load triggers balancer and splitter background processes which tries to rebalance the machine.

This puts in a lot of load on the existing machines as it increases the network traffic and locks the machines for writes.



Thank You

@suchitpuri 


Questions?


Made with Slides.com