Changing for MongoDB
Thierry Delprat
tdelprat@nuxeo.com
https://github.com/tiry/


Switching to MongoDB

What does the switch to MongoDB change ?
For developers & architects
For ops and users of the product

Switching to MongoDB
Some Context
    
  we provide a Platform that developers can use to 
build highly customized Content Applications  
we provide components, and the tools to assemble them
everything we do is open source  
 
https://github.com/nuxeo

Nuxeo Platform & Storage
    
Content Repository
Storage
NUXEO PLATFORM & STORAGE
    
Content Repository
Simplify software architecture
 
Offer easy scalability options
 
Small impact on development


NUXEO PLATFORM & STORAGE
?
Simplify Architecture
Making work easier for Ops & Architects
IMPEDANCE ISSUE
    
Object
IMPEDANCE ISSUE
    
Object
IMPEDANCE ISSUE
    
Object
Coding & Maintenance Impact
    
No Lazy Loading
No Cache / No Invalidations
A lot of complexity and problems avoided !
Object
Example: Impact on Nuxeo deployment
    
EXAMPLE: IMPACT ON NUXEO DEPLOYMENT
    
Simplify deployment architecture
Hybrid Storage
Complex structures (schema) - R/W - Synchronous
Document properties and hierarchy
 
    Large Streams - Large Storage
attached Blobs
Flexible Schema - Write Once/Read Many
Audit log, Activity log 
Flexible Schema - Search
Search index 






Hybrid Storage
    




GridFS
Complex structures (schema) - R/W - Synchronous
Document properties and hierarchy
 
Large Streams - Large Storage
attached Blobs
Flexible Schema - Write Once/Read Many
Audit log, Activity log 
Flexible Schema - Search
Search index 
CONSOLIDATED Storage
Single Consolidated Storage
Structure, Blobs, Audit & Index
    Fewer building blocks to provision & configure
Easier to deploy

EASIER to Deploy a Robust Architecture
    
"built-in" - data redundancy & fault tolerance
    active
active
Simplicity ?
No ORM Hell
Single storage
OTB robust deployment

Scalability
Avoid headaches at deployment time
Improve end-user experience
    Will I Be Faster
with mongodb ? 
Built for SPEED
No Impedance issue
fewer backend calls
no invalidation cost
 
Document level locking
no table level concurrency
 
Native distributed architecture
Easy scale out of read

SPEED

Significant RAW Speed improvements for all use cases
More importantly: some use cases are much better handled

https://benchmarks.nuxeo.com/continuous/index.html
More than RAW Perrformances
    Handle more concurrent connections
No Cache
Less memory per Connection
Can handle more connections
Can handle more concurrent Users
    MORE THAN RAW PERFORMANCES
    
    
    
            Read & Write Operations
                
are competing
            
        
            Write Operations
                
are not blocked
            
        
            
                C4.xlarge (nuxeo)
                    
C4.2Xlarge (DB)
                
            
        
SQL
WRITEs are not blocked by READs
More than RAW Performances
Processing on large Objects sets is challenging with ORM
    No side effects of impedance mismatch
Sample batch on 100,000 documents
750 documents/s with SQL backend (cold cache)
 
11,500 documents/s with MongoDB / wiredTiger: x15
    lazy loading
cache trashing

Will I SCALE BETTER
with mongodb ? 
Scalability options
Scale out READs
- 
	
Leverage ReplicaSets
(Read from secondaries) 
    Scale out WRITEs
- 
	
Leverage Sharding
(Spread Writes)
 
No Impact at application level !
Scale out Test
1 Nuxeo node + 1 MongoDB node
 1900 docs/s
MongoDB CPU is the bottleneck (800%)
    
    
    
    Use massive read operations and queries.
    2 Nuxeo nodes + 1 MongoDB node
 1850 docs/s
MongoDB CPU is the bottleneck (800%)
2 Nuxeo nodes + 2 MongoDB nodes
 3400 docs/s 
(using read preferences)   
 
SHARDING TEST



2 Nuxeo nodes
+
1 MongoDB ReplicaSet
 
 11,000 docs/s 
  
 
2 Nuxeo nodes
+
3 MongoDB Sharded ReplicaSet
 27,400 docs/s 
  
 
Use bulk import.
DEVELOPMENT IMPACT
Changes from a development point of view
New Storage Model
Document level transactions
 
No MVCC isolation

Provide shared mitigation policies
for critical use cases
Different transaction paradigm
Consistency in our Context
Atomic Document Operations are safe
Large batch updates can not be Atomic
    Find a way to mitigate application level impact
Transactions can not span across multiple documents
Multi-documents transactions can be problematic
Workflows or custom event handlers

Ensuring consistency
Transient State Manager
Run all operations in Memory
Populate an Undo Log
    
    - Recover Application level Transaction Management
	
- 
Commit / Rollback model 
 
 - 
Commit / Rollback model 
 - 
"Read uncommited" isolation
	
- Need to flush transient state for queries
 - "uncommited" changes are visible to others
 
 
Inertia
New Model
New API
New Query system
    Provide an easy migration path


Nuxeo Approach
High level API + Encapsulation
Storage Adapters
    
DOCUMENT REPOSITORY
    
    
Helps transitioning between storages
DOCUMENT REPOSITORY
    
    DOCUMENT REPOSITORY
    
    DOCUMENT REPOSITORY
    
    DOCUMENT REPOSITORY
    
    DOCUMENT REPOSITORY
    
    
    No Impact at application level
Can be deployment time choice
TakeAways
Simplify architecture
Offer simple scalability options
Be an easy migration

Changing for MongoDB can
Content Management + MongoDB
You should try Nuxeo !
Any Questions ?
Thank You !
https://github.com/nuxeo
http://www.nuxeo.com/careers/
Changing for MongoDB
By Thierry Delprat
Changing for MongoDB
- 6,760