Thierry Delprat
tdelprat@nuxeo.com
https://github.com/tiry/
Understanding the challenges & use cases
Storing objects (think {JSON} object)
Custom Domain Model
Conversions & Previews
Security Policies
on any field
At SCALE
Application Log
Choose storage backends according to challenges
Store Structures
in SQL Database
or store Structures
in MongoDB
store streams
in MongoDB too
store streams
in S3
Leverage
Google Drive & Google Doc integration
Search
Storage / Import
Async processing
the first challenge
Offload search queries to
One query
Several possible backends
Asynchronous batched updates
Ensure we can not lose any update
Big document databases and big imports
?
No Impedance issue
less backend calls
no invalidation cost
Document level locking
no table level concurrency
Native distributed architecture
Easy scale out of read
Document level transactions
No MVCC isolation
Provide shared mitigation policies
for critical use cases
Different transaction paradigm
Transient State Manager
Run all operations in Memory
Populate an Undo Log
Significant RAW Speed improvements for most use cases
More importantly: some use cases are simply better handled
https://benchmarks.nuxeo.com/
Side effects of no-cache
No Cache
Less memory per Connection
More connections
More Concurrent Users
Processing on large Document sets are an issue on SQL
Side effects of impedance miss match
Sample Nuxeo batch on 100,000 documents
750 documents/s with SQL backend
(cold cache)
11,500 documents/s with MongoDB / wiredTiger: x15
lazy loading
cache trashing
Read & Write Operations
are competing
Write Operations
are not blocked
C4.xlarge (nuxeo)
C4.2Xlarge (DB)
SQL
READ + WRITE
Side effects of hyper-scaling the repository
Millions of documents
Indexing Jobs
Conversion Jobs
Audit update
Use shared Queues to manage Jobs
Already used for
Cache & Invalidations
Good match for Complex structures + Atomic API + Speed
Results and lessons learned
Initial import
https://benchmarks.nuxeo.com/
SOON
https://benchmarks.nuxeo.com/
How to deploy that ?
Nuxeo Cluster
MongoDB Replicaset
ES Cluster
Redis + Sentinel
Kafka Cluster
ZK Cluster
this is complex !
Makes it easier.
Leverage various storage sub-systems.
Thank You !
https://github.com/nuxeo
http://www.nuxeo.com/careers/