What we Do and What Problems We Try to Solve
Track game builds
Electronic Flight Bags
Central repository for Models
Food industry PLM
https://github.com/nuxeo
Document Oriented Database
Document Repository
Store JSON Documents
Manage Document attributes,
hierarchy, blobs, security, lifecycle, versions
2006: Nuxeo Repository is based on ZODB (Python / Zope based)
This is not JSON in NoSQL, but Python serialization in ObjectDB
Conccurency and performances issues, Bad transaction handling
2007: Nuxeo Platform 5.1 - Apache JackRabbit (JCR based)
Mix SQL + Java Serialization + Lucene
Transaction and consistency issues
2009: Nuxeo 5.2 - Nuxeo VCS
SQL based repository : MVCC & ACID
very reliable, but some use cases can not fit in a SQL DB !
2014: Nuxeo 5.9 - Nuxeo DBS
Document Based Storage repository
MongoDB is the reference backend
Object DB
Document DB
SQL DB
Understanding the motivations
for moving to MongoDB
Search API is the most used :
search is the main scalability challenge
Let's do some benchmarks of Nuxeo + MongoDB
to check that it is true!
Low level read (fast re-indexing with elasticsearch)
3,500 documents/s using SQL backend
10,000 documents/s using MongoDB (+180%)
Raw performances
Single Server
6 core HT 3.5Ghz
126 GB RAM
std hdd
SQL
with tunning
commodity hardware
SQL
1 Nuxeo node + 1 MongoDB node
1900 docs/s
MongoDB CPU is the bottleneck (800%)
2 Nuxeo nodes + 1 MongoDB node
1850 docs/s
MongoDB CPU is the bottleneck (800%)
2 Nuxeo nodes + 2 MongoDB nodes
3400 docs/s when using read preferences
Adding one MongoDB node adds 80% throughput
Use massive read operations and queries.
Yes: this kind of setup is possible using SQL DB too
But:
setup is usually not that simple
MongoDB ReplicatSet is easy
impacts at Transaction Manager level
read-only routing encapsulated in MongoDB client
Faster: for both Read and Write
Volume: on commodity hardware
Architecture: scale out compliant
let's see the technical details
Inside nuxeo-dbs storage adapter
{
"ecm:id":"52a7352b-041e-49ed-8676-328ce90cc103",
"ecm:primaryType":"MyFile",
"ecm:majorVersion":NumberLong(2),
"ecm:minorVersion":NumberLong(0),
"dc:title":"My Document",
"dc:contributors":[ "bob", "pete", "mary" ],
"dc:created": ISODate("2014-07-03T12:15:07+0200"),
...
"cust:primaryAddress":{
"street":"1 rue René Clair", "zip":"75018", "city":"Paris", "country":"France"},
"files:files":[
{ "name":"doc.txt", "length":1234, "mime-type":"plain/text",
"data":"0111fefdc8b14738067e54f30e568115"
},
{
"name":"doc.pdf", "length":29344, "mime-type":"application/pdf",
"data":"20f42df3221d61cb3e6ab8916b248216"
}
],
"ecm:acp":[
{
name:"local",
acl:[ { "grant":false, "perm":"Write", "user":"bob"},
{ "grant":true, "perm":"Read", "user":"members" } ]
}]
...
}
ecm:parentId
ecm:ancestorIds
{ ...
"ecm:parentId" : "3d7efffe-e36b-44bd-8d2e-d8a70c233e9d",
"ecm:ancestorIds" : [ "00000000-0000-0000-0000-000000000000",
"4f5c0e28-86cf-47b3-8269-2db2d8055848",
"3d7efffe-e36b-44bd-8d2e-d8a70c233e9d" ]
...}
ecm:racl: ["Management", "Supervisors", "bob"]
db.default.find({"ecm:racl": {"$in": ["bob", "members", "Everyone"]}})
{...
"ecm:acp":[ {
name:"local",
acl:[ { "grant":false, "perm":"Write", "user":"bob"},
{ "grant":true, "perm":"Read", "user":"members" } ]}]
...}
db.default.find({
$and: [
{"dc:title": { $in: ["Workspaces", "Sections"] } },
{"ecm:racl": {"$in": ["bob", "members", "Everyone"]}}
]
}
)
SELECT * FROM Document WHERE dc:title = 'Sections' OR dc:title = 'Workspaces'
Find a way to mitigate consistency issues
Transactions can not span across multiple documents
Typical use cases
Use each storage solution for what it does the best
SQL DB
store content in an ACID way
consistency over availability
MongoDB
store content in a BASE way
availability over consistency
elasticsearch
provide powerful and scalable queries
Storage does not impact application : this can be a deployment choice!
Atomic Consistent
Isolated Durable
Basic Availability
Soft state
Eventually consistent
SQL DB collapses (on commodity hardware)
MongoDB handles the volume
Read & Write Operations
are competing
Write Operations
are not blocked
C4.xlarge (nuxeo)
C4.2Xlarge (DB)
SQL
low level import on AWS
about 5x faster !
MongoDB has no impedance mismatch
Side effects of impedance miss match
6,000 documents/s with MongoDB / mmapv1: x9
11,000 documents/s with MongoDB / wiredTiger: x15
9,500 documents/s with MongoDB / mmapv1: x13
11,500 documents/s with MongoDB / wiredTiger: x15
14,000 documents/s with MongoDB/mmapv1: x18
11,000 documents/s with MongoDB/wiredTiger: x15
processing benchmark
based on a real use case
native distributed architecture
active
active
They chose Nuxeo to build their Video repository
looks like a good use case for MongoDB
lots of data + lots of updates
they chose MongoDB
they are happy with it !
Going further with MongoDB
Thank You !
https://github.com/nuxeo
http://www.nuxeo.com/careers/