What we Do and What Problems We Try to Solve
Track game builds
Electronic Flight Bags
Central repository for Models
Food industry PLM
https://github.com/nuxeo
Heavily configurable : all data structures are flexible / customizable
Used by developers to build Content Applications on top of the Nuxeo Repository
Search API is the most used:
search is the main scalability challenge
2006: Nuxeo CPS 3.6
(Python / Zope based)
Replace built-in index with
lucene + XML-RPC server
pyLucene
(GCJ build+ python bindings!)
Complex setup
2007: Nuxeo Platform 5.1
JCR : queries (and backup) issues
Integrate Compass Core
transactionnal & storage abstraction
Missing sync & concurrency issues
2009: Nuxeo 5.2
VCS : Homebrew SQL based repository
Search in database but some real limitations
2013 / 2014: Nuxeo 5.9.3
Reintroduce Lucene in the stack via elasticsearch
Learn from our past mistakes
... we are now happy with Elasticsearch
Lucene and Nuxeo have a long story ...
Understanding the Issue
Search API is the most used :
search is the main scalability challenge
Search API is the most used:
search is the main scalability challenge
SELECT "hierarchy"."id" AS "_C1" FROM "hierarchy"
JOIN "fulltext" ON "fulltext"."id" = "hierarchy"."id"
LEFT JOIN "misc" "_F1" ON "hierarchy"."id" = "_F1"."id"
LEFT JOIN "dublincore" "_F2" ON "hierarchy"."id" = "_F2"."id"
WHERE
("hierarchy"."primarytype" IN ('Video', 'Picture', 'File', 'Audio'))
AND ((TO_TSQUERY('english', 'sydney') @@NX_TO_TSVECTOR("fulltext"."fulltext")))
AND ("hierarchy"."isversion" IS NULL)
AND ("_F1"."lifecyclestate" <> 'deleted')
AND ("_F2"."created" IS NOT NULL )
ORDER BY "_F2"."created" DESC
LIMIT 201 OFFSET 0;
SQL technology is not the solution
SQL or NoSQL repository are not the solution
Toward an Hybrid Storage
Use each storage solution for what it does the best
SQL DB
store content in an ACID way
store & retrieve
queries needed ACID and MVCC
elasticsearch
provide powerful and scalable queries
do the heavy lifting that the RDBMS can not do
scoring, native full-text, aggregates
distributed search
Route the query to the correct index depending on requirements
One query
Several possible backends
Fast indexing
No ACID constraints / No impedance issue
3,500 documents/s when using SQL backend
10,000 documents/s when using MongoDB
Super query performance
query on term using inverted index
very efficient caching
native full text support & distributed architecture
3,000 queries/s with 1 elasticsearch node
6,000 queries/s with 2 elasticsearch nodes
We are now testing the Nuxeo 6 stack in AWS.
DB is Postgres SQL db.r3.8xlarge which is a a 32 cpus
Between 350 and 400 tps the DB cpu is maxed out.
Please activate nuxeo-elasticsearch !
We are now able to do about 1200 tps with almost 0 DB activity.
Question though, Nuxeo and ES do not seem to be maxed out ?
It looks like you have some network congestion between your client and the servers.
...right... we have pushed past 1900 tps ... I think we are close to declaring success for this configuration ...
Customer
Customer
Customer
Nuxeo support
Nuxeo support
Scalability is simply from another order of magnitude
For users
it really looks like magic
For sales guys & solution architects
it is magic: it unleashes a lot of possibilities
performance is just one aspect
For Nuxeo Core Dev team
it was almost magic: some integration work was needed
Inside nuxeo-elasticsearch Plugin
but elasticsearch brings us much more than just scalability
More than Raw Speed
-- Use an explicit Elasticsearch field
SELECT * FROM Document WHERE /*+ES: INDEX(dc:title.ngram) */ dc:title = 'foo'
-- Use ES operators not present in NXQL
SELECT * FROM Document WHERE /*+ES: OPERATOR(regex) */ dc:title = 's.*y'
SELECT * FROM Document WHERE /*+ES: OPERATOR(fuzzy) */ dc:title = 'zorkspaces'
-- Use ES for GeoQuery based on geo_hash_cell location near a point using geohash;
SELECT * FROM Document WHERE /*+ES: OPERATOR(geo_hash_cell)*/ osm:location IN ('40','-74','5')
leverage what comes for free with elasticsearch
Easy real time data analytics on business data
Queries on Documents + Audit: flexible reporting on workflows
_source
Leveraging Even More elasticsearch
Thank You !
https://github.com/nuxeo
http://www.nuxeo.com/careers/