Using MongoDB
to build a fast and scalable Content Repository

Some Context

What we Do and What Problems We Try to Solve

Nuxeo

we provide a Platform that developers can use to
build highly customized Content Applications

we provide components, and the tools to assemble them

everything we do is open source

various customers - various use cases

Track game builds

Electronic Flight Bags

Central repository for Models

Food industry PLM

https://github.com/nuxeo

DOCUMENT REPOSITORY

SQL based Repository - VCS

KEY Limitations of the SQL approach

Impedance issue
- storing Documents in tables is not easy
- requires Caching and Lazy loading
Scalability
- Document repository can become very large (versions, workflows ...)
- scaling out SQL DB is very complex (and never transparent)
Concurrency model
- Heavy write is an issue (Quotas, Inheritance)
- Hard to maintain good Read & Write performances

Need a different
storage model !

No SQL

Integrating MongoDB

Inside nuxeo-dbs storage adapter

FROM SQL to MongoDB Storage

Document base Storage & Mongodb

Storing Nuxeo Documents in MongoDB

{  
   "ecm:id":"52a7352b-041e-49ed-8676-328ce90cc103",
   "ecm:primaryType":"MyFile",
   "ecm:majorVersion":NumberLong(2),
   "ecm:minorVersion":NumberLong(0),
   "dc:title":"My Document",
   "dc:contributors":[ "bob", "pete", "mary" ],
   "dc:created":   ISODate("2014-07-03T12:15:07+0200"), 
...
   "cust:primaryAddress":{  
      "street":"1 rue René Clair", "zip":"75018", "city":"Paris", "country":"France"},
   "files:files":[  
      {  "name":"doc.txt", "length":1234, "mime-type":"plain/text",
         "data":"0111fefdc8b14738067e54f30e568115"
      },
      {  
         "name":"doc.pdf", "length":29344, "mime-type":"application/pdf",
         "data":"20f42df3221d61cb3e6ab8916b248216"
      }
   ],
   "ecm:acp":[  
      {  
         name:"local",
         acl:[ { "grant":false, "perm":"Write", "user":"bob"},
               { "grant":true,  "perm":"Read", "user":"members" } ]
      }]
...
}

40+ fields by default
- depends on config
18 indexes

hIERARCHY & Security

Parent-child relationship
Recursion optimized through array

            
                ecm:parentId

            
                ecm:ancestorIds

{ ... "ecm:parentId" : "3d7efffe-e36b-44bd-8d2e-d8a70c233e9d", 
      "ecm:ancestorIds" : [ "00000000-0000-0000-0000-000000000000", 
                            "4f5c0e28-86cf-47b3-8269-2db2d8055848", 
                            "3d7efffe-e36b-44bd-8d2e-d8a70c233e9d" ] ...}

Generic ACP stored in ecm:acp field
Precomputed Read ACLs to avoid post-filtering on search

ecm:racl: ["Management", "Supervisors", "bob"]

{... "ecm:acp":[ {  
              name:"local",
              acl:[ { "grant":false, "perm":"Write", "user":"bob"},
                    { "grant":true,  "perm":"Read", "user":"members" } ]}] ...}

SEARCH

db.default.find({
   $and: [
   {"dc:title": { $in: ["Workspaces", "Sections"] } },
   {"ecm:racl": {"$in": ["bob", "members", "Everyone"]}}
   ]
 }
)

SELECT * FROM Document WHERE dc:title = 'Sections' OR dc:title = 'Workspaces'

Storing Blobs

About Consistency

Atomic Document Operations are safe
Large batch updates is not so much of an issue
Multi-documents transactions are an issue
- ex: Workflows

Find a way to mitigate consistency issues

Transactions can not span across multiple documents

Mitigating consistency issues

Transient State Manager
- Run all operations in Memory
- Populate an Undo Log

Recover partial Transaction Management
- Commit / Rollback model
"Read uncommited" isolation
- Need to flush transient state for queries
- "uncommited" changes are visible to others

Hybrid Storage Architecture

MongoDB
- store structure & streams in a BASE way
elasticsearch
- provide powerful and scalable queries
SQL DB
- store structures in an ACID way

Storage does not impact application : this can be a deployment choice!

A tomic C onsistent
I solated D urable

B asic A vailability
S oft state
E ventually consistent

depends on Availability & Performances requirements

Nuxeo + MongoDB

Supercharge your Content Repository

No SQL with Mongodb

No Impedance issue
- One Nuxeo Document = One MongoDB Document
No Scalability issue for CRUD
- native distributed architecture allows scale out
No Concurrency performance issue
- Document Level "Transactions"
No application level cache is needed
- No need to manage invalidations

Is Nuxeo Fast
with mongodb ?

Nighly CI Benchmarks

Low level read
(fast re-indexing with elasticsearch)

3,500 documents/s using SQL backend
10, 000 documents/s using MongoDB

Raw Java API performances

Single Server
6 core HT 3.5Ghz
126 GB RAM
std hdd

about 3 times faster

NIGHLY CI BENCHMARKS

Read via REST API (misc Read APIs)

900 docs/s using SQL backend
1,300 docs/s using MongoDB (+40%)
1,370 docs/s using MongoDB/wiredTiger (+50%)

Raw Http API performances

Single Server
6 core HT 3.5Ghz
126 GB RAM
std hdd

Update via REST API

540 docs/s using SQL backend
910 docs/s using MongoDB (+70%)
990 docs/s using MongoDB/wiredTiger (+80%)

Storage is no longer the bottleneck

HUGE Repository - Heavy loading

SQL DB Collapse / MongoDB handles the volume

Massive number
of documents

Automatic versioning

Write intensive
Daily Imports

Benchmarking Mass Import

SQL

with tunning

commodity hardware

SQL

7x faster

Benchmarking Read + Write

Read & Write Operations
are competing

Write Operations
are not blocked

C4.xlarge (nuxeo)
C4.2Xlarge (DB)

SQL

Data LOADING Overflow

Processing on large Document sets are an issue on SQL

Side effects of impedance miss match

Ex: Process 100,000 documents

750 documents/s with SQL backend (cold cache)
9,500 documents/s with MongoDB / mmapv1: x13
11,500 documents/s with MongoDB / wiredTiger: x15

lazy loading

cache trashing

Benchmarking Scale out

1 Nuxeo node + 1 MongoDB node
-  1900 docs/s
- MongoDB CPU is the bottleneck (800%)
2 Nuxeo nodes + 1 MongoDB node
-  1850 docs/s
- MongoDB CPU is the bottleneck (800%)
2 Nuxeo nodes + 2 MongoDB nodes
- 3400 docs/s when using read preferences

Use massive read operations and queries.

Document repository
is Fast
with MongoDB

So, Yes !

Example use case ?

VOD repository

Requirements:
- store videos
- manage meta-data & availability
- manage workflows
- generate thumbs & conversions
Very Large Objects:
- lots of meta-data (dublincore, ADI, ratings ...)
Massive daily updates
- updates on rights and availability
Need to track all changes
- prove what was the availability for a given date

Real life project choosing Nuxeo with MongoDB backend

good use case for MongoDB
want to use MongoDB

lots of data + lots of updates

Any Questions ?

Thank You !

https://github.com/nuxeo

http://www.nuxeo.com/careers/

20min - Using MongoDB to build a fast and scalable content repository

By Thierry Delprat

20min - Using MongoDB to build a fast and scalable content repository

MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source software platform, can now completely rely on MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. It will give you the opportunity to see how the integration in the Nuxeo Platform was achieved and what benefits were gained: increased performance (with actual benchmarks) and better response to some use cases.

2,999

Using MongoDB to build a fast and scalable Content Repository

Some Context

Nuxeo

DOCUMENT REPOSITORY

SQL based Repository - VCS

KEY Limitations of the SQL approach

Need a different storage model !

Integrating MongoDB

FROM SQL to MongoDB Storage

Document base Storage & Mongodb

Storing Nuxeo Documents in MongoDB

hIERARCHY & Security

SEARCH

Storing Blobs

About Consistency

Mitigating consistency issues

Hybrid Storage Architecture

Nuxeo + MongoDB

No SQL with Mongodb

Is Nuxeo Fast with mongodb ?

Nighly CI Benchmarks

NIGHLY CI BENCHMARKS

HUGE Repository - Heavy loading

Benchmarking Mass Import

Benchmarking Read + Write

Data LOADING Overflow

Benchmarking Scale out

Document repository is Fast with MongoDB

So, Yes !

Example use case ?

VOD repository

Any Questions ?

20min - Using MongoDB to build a fast and scalable content repository

More from Thierry Delprat

Using MongoDB
to build a fast and scalable Content Repository

Need a different
storage model !

Is Nuxeo Fast
with mongodb ?

Document repository
is Fast
with MongoDB