Changing for MongoDB

Thierry Delprat
tdelprat@nuxeo.com

https://github.com/tiry/

Switching to MongoDB

What does the switch to MongoDB change ?

For developers & architects 

For ops and users of the product

Switching to MongoDB

Some Context

we provide a Platform that developers can use to
build highly customized Content Applications

we provide components, and the tools to assemble them

everything we do is open source

 

https://github.com/nuxeo

Nuxeo Platform & Storage

Content Repository

Storage

NUXEO PLATFORM & STORAGE

Content Repository

Simplify software architecture
 

Offer easy scalability options
 

Small impact on development  

NUXEO PLATFORM & STORAGE

?

Simplify Architecture

Making work easier for Ops & Architects

IMPEDANCE ISSUE

Object

IMPEDANCE ISSUE

Object

IMPEDANCE ISSUE

Object

Coding & Maintenance Impact

No Lazy Loading
No Cache / No Invalidations

A lot of complexity and problems avoided !

Object

Example: Impact on Nuxeo deployment

EXAMPLE: IMPACT ON NUXEO DEPLOYMENT

Simplify deployment architecture

Hybrid Storage

Complex structures (schema) - R/W - Synchronous

Document properties and hierarchy
 

 

Large Streams - Large Storage

attached Blobs

Flexible Schema - Write Once/Read Many

Audit log, Activity log   

 

Flexible Schema - Search

Search index ​​

Hybrid Storage

GridFS

Complex structures (schema) - R/W - Synchronous

Document properties and hierarchy
 

 

Large Streams - Large Storage

attached Blobs

Flexible Schema - Write Once/Read Many

Audit log, Activity log   

 

Flexible Schema - Search

Search index ​​

CONSOLIDATED Storage

Single Consolidated Storage
Structure, Blobs, Audit & Index

Fewer building blocks to provision & configure

Easier to deploy

EASIER to Deploy a Robust Architecture

 

"built-in" - data redundancy & fault tolerance

active

active

Simplicity ?

No ORM Hell

Single storage

OTB robust deployment

Scalability

Avoid headaches at deployment time

Improve end-user experience

Will I Be Faster
with mongodb ?

Built for SPEED

No Impedance issue

fewer backend calls

no invalidation cost
 

Document level locking

no table level concurrency
 

Native distributed architecture

Easy scale out of read

SPEED

Significant RAW Speed improvements for all use cases

More importantly: some use cases are much better handled

https://benchmarks.nuxeo.com/continuous/index.html

More than RAW Perrformances

Handle more concurrent connections

No Cache

Less memory per Connection 

Can handle more connections

Can handle more concurrent Users

MORE THAN RAW PERFORMANCES

Read & Write Operations
are competing

Write Operations
are not blocked

C4.xlarge (nuxeo)
C4.2Xlarge (DB)

SQL

WRITEs are not blocked by READs

More than RAW Performances

Processing on large Objects sets is challenging with ORM

No side effects of impedance mismatch

Sample batch on 100,000 documents​

750 documents/s with SQL backend (cold cache)
 

11,500 documents/s with MongoDB / wiredTiger:  x15

lazy loading

cache trashing

Will I SCALE BETTER
with mongodb ?

Scalability options

Scale out READs​

  • Leverage ReplicaSets
    (Read from secondaries)

Scale out WRITEs

  • Leverage Sharding
    (Spread Writes)

No Impact at application level !

Scale out Test

1 Nuxeo node + 1 MongoDB node

1900 docs/s

MongoDB CPU is the bottleneck (800%)

Use massive read operations and queries.

​2 Nuxeo nodes + 1 MongoDB node

1850 docs/s

MongoDB CPU is the bottleneck (800%)

​2 Nuxeo nodes + 2 MongoDB nodes

3400 docs/s
(using read preferences)   

 

SHARDING TEST

​2 Nuxeo nodes
+
1 MongoDB ReplicaSet

 

​ 11,000 docs/s
  
 

​2 Nuxeo nodes
+
3 MongoDB Sharded ReplicaSet

​ 27,400 docs/s
  
 

Use bulk import.

DEVELOPMENT IMPACT

Changes from a development point of view

New Storage Model

Document level transactions
 

No MVCC isolation

Provide  shared mitigation policies

for critical use cases

Different transaction paradigm

Consistency in our Context

Atomic Document  Operations are safe

Large batch updates can not be Atomic

Find a way to mitigate application level impact

Transactions can not span across multiple documents

Multi-documents transactions can be problematic

Workflows or custom event handlers

Ensuring consistency

Transient State Manager

Run all operations in Memory

Populate an Undo Log

  • Recover Application level Transaction Management
    • Commit / Rollback model
       
  • "Read uncommited" isolation
    • Need to flush transient state for queries
    • "uncommited" changes are visible to others

Inertia

New Model
New API

New Query system

Provide an easy migration path

Nuxeo Approach

High level API + Encapsulation

Storage Adapters

DOCUMENT REPOSITORY

Helps transitioning between storages

DOCUMENT REPOSITORY

DOCUMENT REPOSITORY

DOCUMENT REPOSITORY

DOCUMENT REPOSITORY

DOCUMENT REPOSITORY

No Impact at application level

Can be deployment time choice

TakeAways

Simplify architecture

Offer simple scalability options

Be an easy migration

Changing for MongoDB can

Content Management + MongoDB

You should try Nuxeo !

Any Questions ?

Thank You !

https://github.com/nuxeo

http://www.nuxeo.com/careers/