Nuxeo Platform
Architecture Overview

 

Thierry Delprat

td@nuxeo.com
https://github.com/tiry/

Some Context

About Nuxeo Platform

Nuxeo

we provide a Platform that developers can use to
build highly customized Content Applications

we provide components, and the tools to assemble them

everything we do is open source

 

various customers - various use cases 




 

Track game builds

Electronic Flight Bags

Central repository for Models

Food industry PLM

https://github.com/nuxeo

Nuxeo Document 

Storing objects (think {JSON} object)

  • Schemas
     
  • Streams
     
  • Security
     
  • Search
     
  • Audit

 

Custom Domain Model

Conversions & Previews

Security Policies

on any field

At SCALE

Application Log

Nuxeo Platform

Repository









 

Services

  • Workflows, Conversions, Diff, Notifications, Activity  ...

Architecture goal

Custom tailored Business application

Application must evolve with Business

Architecture goal

  • Goals
    • Let you focus on business logic
    • Allow advanced configuration 
    • Ensure maintenance
    • Provide clean upgrade path
       
  • Principles
    • Provide an Extensible Component Model
    • Ensure clear separation between
      • ​custom code
      • Nuxeo provided code
    • ​Provide the test infrastructure  

ArchitEcture

Principles and technologies used

Component Model

  • In Nuxeo architecture everything is a plugin
  • ​Everything is configurable
    • ​Logic and Data Structrures depends on configuration

Plugins everywhere

One plugin model for

  • all layers 
  • for the platform
  • for your custom components

Plugins At Storage level

Plugins At Storage level

Plugins At Storage level

Plugins At Storage level

About technologies

  • Core infrastructure
    • Java "a la OSGi"
  • Contributions
    • ​Java, JavaScript, XML
  • Third party technologies
    • ​SQL Storage: PostgreSQL, Oracle, MySQL, MS SQL
    • NoSQL Storage: MongoDB / Marklogic
    • Binary Storage: S3, GridFS, GoogleDrive
    • Indexing: Elasticsearch
    • Queuing: ChronicalQueue,  Kafka
    • Cache and Shared structures: Redis 

About technologies

  • REST API
  • Web UI
    • WebComponents / Polymer UI (nuxeo-elements)
    • Legacy JSF2 back office
    • JAX-RS / Freemarker for custom portal
  • Mobile Application
    • ReactNative based customizable Mobile application
  • Client libs 

FrontEnd - Clients

About technologies

Build - Test - Ship - Deploy

About technologies

  • Testing
    • Junit for unit tests
      • ​Feature runner to deploy Nuxeo bundles 
      • No need for mocks
    • ​WebDriver & Selemium for functional testing
    • Gatling & Funkload for performance testing
       
  • ​Build & Automation
    • ​Maven for build, dependencies and running tests
    • Jenkins for CI
    • Ansible 

FrontEnd - Clients

About technologies

  • Packaging and deployment
    • Debian packages
    • Nuxeo packages
    • VM images
    • Docker images
       
  • Cloud
    • AWS deployment models (Terraform and CloudFormation)
    • K8S / OpenShift templates
    • Ansible Playbook Bundles
       

Customization

Building a business App with Nuxeo

About Extension points

  • 260 Bundles
    • 200+ Components
      • 180+ Services exposed
      • 280+ Extension points
    •  1000+ Contributions

Pretty much everything inside the Platform can be customized

Anatomy of a Nuxeo based Application

  • Easy maintenance and upgrade

    • Clear separation between infrastructure provided by Nuxeo
      and the custom components

    • Nuxeo Studio configuration is transparently upgraded

Building with Nuxeo

Continuous deploymemt

Reusing Customization

Create a new Addon

  • This addon is a first class citizen

    • It can receive additional configuration

    • it can be mixed with other addons

Reusing Customization

  • You can use Studio to configure your addons

    • override or extend domain model

    • extend default configuration

Reusing Customization

  • Share common code and configuration

  • Allow 2nd level customization

Studio Branches

  • Share common code

  • Different branches
    of configuration

Repository

Nuxeo Storage Model

Nuxeo Document

  • a “Document” is not a simple file

    • one document = a persistent object with properties (String, Date, File, Complex types ...)

    • properties are defined by Schemas

  • Document types

    • a document type is defined by a set of schemas, inheritance is supported

  • Lifecycle

    • document type is associated with states and transitions

  • Facets

    • can be used to associate behavior (Folderish, Hidden, Commentable …)

    • can be associated with a schema (Mixins) and with a Business Object adapter

Nuxeo Document

  • Document Schemas are based on XSD







     

  • a field can be a Binary Stream

  • a field can have constraints

    • required, validation pattern

    • reference to a document or a user

    • custom constraint

Scalar properties and arrays:
    dc:title = "My Document"
    dc:contributors = ["bob", "pete", "mary"]
    dc:created = 2014-07-03T12:15:07+0200
    ecm:uuid = 52a7352b-041e-49ed-8676-328ce90cc103

Complex properties and lists of them:
    primaryAddress = { street = "1 rue René Clair", zip = "75018",
                      city = "Paris", country = "France" }
    files = [
     { name = "doc.txt", length = 1234, mime-type = "plain/text",
      data = 0111fefdc8b14738067e54f30e568115 },
     { name = "doc.pdf", length = 29344, mime-type = "application/pdf",
      data = 20f42df3221d61cb3e6ab8916b248216 }
     ]

Storage Adapters

Hybrid Storage Architecture

  • MongoDB

    • store structure & streams in a BASE way
       

  • elasticsearch

    • provide powerful and scalable queries
       

  • SQL DB

    • store structures in an  ACID way

Storage does not impact application : this can be a deployment choice!

A tomic C onsistent
I solated D urable

B asic  A vailability
S oft state  
E ventually consistent

depends on Availability & Performances requirements 

Send Queries
to the repository 
(here SQL)

or send Queries
to elasticsearch 

Store Structures
in SQL Database

or store Structures
in MongoDB

store streams
in MongoDB too

store streams
in S3

HSM

Leverage
Google Drive & Google Doc integration

Can mix all
storages types

Audit Log too
can be configured
to use
elasticsearch 

Blob Store

  • BlobStorage is pluggable
    • several implementations available
      • ​FS, SQL, GridFS
      • S3, Azure, JCloud
      • GoogleDrive, DropBox
  • easy to build new implementation as needed
    • new Cloud Storage, ​HDFS, GlusterFS, Ceph... 

BlobManager

  • Blob Storage is externalized
    • leverage external storage
    • leverage existing repository - no migration
  • Leverage Nuxeo features
    • Indexing including full text extracted from the Blob
    • Security
    • common meta-data
  • BlobManager / LiveConnect is a framework
    • SPI
    • Security model
    • Importer  

BlobManager and External Repositories

Model &
Data Structures

Model &
Data Structures

loaded from Extension Points

(Startup time)

Model &
Data Structures

ORM like mapper

SQL

Model &
Data Structures

Tables are created at startup time
according to configuration

 

SQL

Model &
Data Structures

Fields removed from configuration
are simply ignored (no data loss)

 

SQL

Model &
Data Structures

Added fields are added to the tables.

Old entries get default values.

SQL

Model &
Data Structures

In case of incompatible change,

an error is raised at startup time.

SQL

Model &
Data Structures

No Schema Check at startup.

NoSQL

Search

Searching the repository

Principles

  • All Document properties are indexed  (default)
    • ​can query on any field
       
  •  Search always include Security filtering
    • ​at low level to allow scalability
       
  • ​Sort, Order by and batching are supported
     
  • One or several FullText index
     
  • One Query Language: NXQL

 

SEARCH

SELECT * FROM Document WHERE dc:title = 'Sections' OR dc:title = 'Workspaces'

LEVERAGE ELASTICSEARCH

  • Fast indexing

    • No ACID constraints / No impedance issue
       

    • 3,500 documents/s when using SQL backend

    • 10, 000 documents/s when using MongoDB
       

  • Super query performance

    • query on term using inverted index

    • very efficient caching  

    • native full text support &  distributed architecture
       

    • 3,000 queries/s with 1 elasticsearch node

    • 6,000 queries/s with 2 elasticsearch nodes 

Advanced indexing

  • Fine tuning of elasticsearch indexing
    • multi language support using multiple analyzers and copy_to
    • compound fields created using groovy scripts
  • Introduce elasticsearch hints into NXQL
    • select a specific elasticsearch index / analyzer

       
    • leverage elasticseach operators

       
    • do geolocation search
-- Use an explicit Elasticsearch field
SELECT * FROM Document WHERE /*+ES: INDEX(dc:title.ngram) */ dc:title = 'foo'
-- Use ES operators not present in NXQL
SELECT * FROM Document WHERE /*+ES: OPERATOR(regex) */ dc:title = 's.*y'
SELECT * FROM Document WHERE /*+ES: OPERATOR(fuzzy) */ dc:title = 'zorkspaces'
-- Use ES for GeoQuery based on geo_hash_cell location near a point using geohash; 
SELECT * FROM Document WHERE /*+ES: OPERATOR(geo_hash_cell)*/ osm:location IN ('40','-74','5')

leverage what comes for free with elasticsearch

Leverage Aggregates

  • Leverage elasticsearch aggregates
    • ​integrate with the Query system (PageProvider)
    • integrate with the Listing / UI model (ContentView)
  • Allow to easily build and configure faceted search 

Elasticsearch PASS-Through

  • Expose an HTTP pass-through API on top of Nuxeo integration
    • Integrate Authentication & Authorization 
      • not all users can access workflow index
    • Integrate Security Filtering
      • activate data level security filtering​
    • Expose "virtual index" via http
      •   index + filter
         
  • ​​ Use elasticsearch API related components on Nuxeo data
    • ​Documents + Audit log
    • With embedded security

Easy real time data analytics on business data

Elasticsearch mapping

  • Pluggable document mapping 
    • can denormalize relations
    • can levarage relation engine at indexing time
       
  • Use Elasticsearch hints to query on denormalized index  

ASYNC INDEXING FLOW

PSEUDO-SYNC INDEXING FLOW

Ingestion

Importing data in the repository

Possible strategies

  • REST API or CMIS
    • write importer in JavaScript, Python ...
    • plug with ETL or ESB 
  • Core-IO (Java)
    • Complete Import/Export pipe with transformers
    • Ideal for swapping storage backend
  • CSV importer
    • User facing import
  • Bulk Import Framework
    • ​High speed import
    • Bulk mode / optimized Transaction management
    • Samples with different models  ​​

Blobs

  • Blobs can be "pre-imported"
    • pre-fill BlobStore / S3 bucket
       
  • Use async computation
    • previews and pictures conversions   

Performance

IO Bound / depend on backend

Single Server
6 core HT 3.5Ghz
126 GB RAM
std hdd

Update via REST API

  • 540 docs/s using SQL backend
  • 990  docs/s using MongoDB/wiredTiger (+80%) 

Nuxeo REST API

Flexible, Extensible, Composable

Why API is Key for us

 

  • Nuxeo  Repository is a backend 
    • Portals, Mobile Apps, ERP, CRM ...
       
  • API is UI 
    • for the developers
    • HTML5/JS

API Challenge

" One API ​"
but

​Multiple combinations
of
services, plugins
and Domain Models

Expose a Platform: not an application 

developers using the platform
want to expose the API of  their Application

API Challenge

  • Flexible
    • Adapt to client requirements
       
  • Extensible
    • Enable adding new API
       
  • Composable
    • Expose application specific API

DESIGN Principles

  • Do not lose our soul
    • fight to keep the dynamicity of the platform!
       
  • Be practical
    • Useful is more important than Rest integrism
       
  • Dogfooding is key  
    • if this is not good enough internally, this is not good
       
  • ​Building API is part of the development cycle
    • ​adding http API should never be a task for later
       

Expose simple resources

EXPOSE SIMPLE RESOURCES

Get a Document

GET /nuxeo/api/v1/path/movies/star-wars HTTP/1.1
{
  "entity-type": "document",
  "repository": "default",
  "uid": "5b352650-e49e-48cf-a4e3-bf97b518e7bf",
  "path": "/movies/star-wars",
  "type": "MovieCollection",
  "isCheckedOut": true,
  "title": "Star Wars",
  "facets": [
    "Folderish"
  ]
}

Server returns a minimal payload

Adaptative marshaling

Client need to control what data schemas are sent    

Adaptative marshaling

  • Control what data schemas are sent to the client   
GET /nuxeo/api/v1/path/movies/star-wars HTTP/1.1
X-NXProperties dublincore, common
            
{
  "entity-type": "document",
  "repository": "default",
  "uid": "5b352650-e49e-48cf-a4e3-bf97b518e7bf",
  "path": "/movies/star-wars",
  "type": "MovieCollection",
  "isCheckedOut": true,
  "title": "Star Wars",
  "properties": {
    ...
    "common:icon": "/icons/movieCollection.png",
    "dc:description": "Star Wars collection",
    "dc:creator": "tiry",
    "dc:modified": "2015-10-22T02:12:59.07Z",
    "dc:lastContributor": "tiry",
    "dc:created": "2015-10-22T02:12:59.07Z",
    "dc:title": "Star Wars",
    ...
    "dc:contributors": [tiry, "system" ]
  },
  "facets": [
    "Folderish"
  ]
}

Fetching CONTEXTUAL data

  • Client may require more data
    • get Document children at the same time
    • get the breadcrumb data
    • get thumbnail or preview url
    • ...
       
  • Client ask for the data
    • using Headers
    • using Query String parameters  

Fetching CONTEXTUAL data

Marshaling registry is pluggable

custom Enrichers can be contributed

"How the data is fetched"
is a server side matter 

Fetching CONTEXTUAL data

GET /nuxeo/api/v1/path/movies/star-wars HTTP/1.1
X-NXenrichers.document: thumbnail
            
{
  "entity-type": "document",
  "repository": "default",
  "uid": "5b352650-e49e-48cf-a4e3-bf97b518e7bf",
  "path": "/movies/star-wars",
  "type": "MovieCollection",
  "isCheckedOut": true,
  "title": "Star Wars",
  "contextParameters":
   {
     "thumbnail":
     {
       "url": "/nuxeo/nxthumb/default/5b352650-e49e-48cf-a4e3-bf97b518e7bf/thumb:thumbnail/Small_photo.jpg"
     }
  },
  "facets": [
    "Folderish"
  ]
}
GET /nuxeo/api/v1/path/movies/star-wars?enrichers.document=thumbnail HTTP/1.1

Retrieve Linked Data

  • Resolve entity fields
    • pointing to a label
    • pointing to an other Document
    • pointing to a User
    • ...
Implicit JOIN

Retrieve Linked Data

  • Use client side parameter to know what to resolve
    • header
    • QueryString parameter




       
  • Can be recursive
    • client need to control that too!
fetch.objectType=fieldToFetch

translate.objectType=fieldToTranslate

depth=children

Retrieve Linked Data

Adapters

  • Change the return type
    • get only ACLs or History info about the Document
    • get the tasks associated to document


       
  • Use your own business object
    •  use business Adapters
      • wrap document or documents 
      • provide custom marshaling 
GET /nuxeo/api/v1/path/movies/star-wars@acl HTTP/1.1
GET /nuxeo/api/v1/path/movies/star-wars@audit HTTP/1.1
GET /nuxeo/api/v1/path/movies/star-wars@bo/MyBusinessObject HTTP/1.1

Adapters

Adapters

{
    entity-type: "MovieCollection"
    id: "5b352650-e49e-48cf-a4e3-bf97b518e7bf",
    "title": "Star Wars"
    "episodes": 7
}
GET /nuxeo/api/v1/path/movies/star-wars@bo/MovieCollection HTTP/1.1

Blobs

  • Sent as links
    • Digest
    • CDN



       
  • Uploaded  out-of-band
    • ​chunking
    • reference in JSON 

Blob Upload

  • Upload EndPoint 


     
  • Reference Blobs from JSON Payload
{"entity-type": "document",
 "properties": {
  {
   "file:content" : {
     "upload-batch' : "0b0061d48f69b072",
     "upload-fileId" : 0,
     "type" : "blob"
    }
}}            
POST /api/v1/upload/{batchId}/{fileIdx} HTTP 1.1
X-Upload-Chunk-Index 0 
X-Upload-Chunk-Count 5
PUT /nuxeo/api/v1/path/movies/star-wars HTTP/1.1

Need a way to map 100+ Services

Without creating 100 endpoints!

Need an other paradigm !

Command synopsis

Command

INPUT
(Doc, Blob, User ...)

OUTPUT
(Doc, Blob, User ...)

Parameters

Context
(User, Doc ...)

Commands

            
                WebUI.AddErrorMessage   WebUI.AddInfoMessage   WebUI.AddMessage   Document.AddPermission   Document.AddToCollection   DocumentMultivaluedProperty.addItem   Task.ApplyDocumentMapping   Blob.AttachOnDocument   BlobHolder.AttachOnCurrentDocument   AttachFiles   Audit.QueryWithPageProvider   Blob.ImportClipboard   Blob.ImportWorklist   Blob.RunConverter   Document.BlockPermissionInheritance   WorkflowModel.BulkRestartInstances   Business.BusinessCreateOperation   Business.BusinessFetchOperation   Business.BusinessUpdateOperation   Navigation.GoBack   WorkflowInstance.Cancel   Navigation.ChangeCurrentTab   Document.CheckIn   Document.CheckOut   Update.NextStep.ConditionalFolder   WebUI.ClearClipboard   WebUI.ClearSelectedDocuments   WebUI.ClearWorklist   WorkflowTask.Complete   Blob.ConcatenatePDFs   Context.FetchDocument   Context.FetchFile   Blob.ToPDF   Blob.Convert   Document.Copy   Document.Create   FileManager.Import   UserWorkspace.CreateDocumentFromBlob   Seam.CreateDocumentInUI   Picture.Create   Document.CreateLiveProxy   Document.AddRelation   Collection.Create   Workflow.CreateRoutingTask   Task.Create   Directory.CreateEntries   Document.Delete   Document.DeleteRelation   Directory.DeleteEntries   WebUI.DestroySeamContext   Repository.GetDocument   Document.Export   WebUI.DownloadFile   Blob.ExportToFS   Document.FetchByProperty   Blob.CreateFromURL   FileManager.ImportInSeam   FileManager.ImportWithMetaData   FileManager.ImportWithMetaDataInSeam   Document.Filter   Document.FollowLifecycleTransition   Comment.Moderate   Document.GetBlobs   Document.GetChild   Document.GetChildren   Document.GetBlob   Document.GetBlobsByProperty   User.GetUserWorkspace   Document.GetLinkedDocuments   Proxy.GetSourceDocument   User.Get   Document.GetParent   Context.GetEmailsWithPermissionOnDoc   Context.GetTaskNames   Context.GetUsersGroupIdsWithPermissionOnDoc   Document.GetVersions   Directory.Projection   Collection.Suggestion   User.GetCollections   Directory.Entries   Directory.SuggestEntries   Collection.GetDocumentsFromCollection   Favorite.GetDocuments   Document.Routing.GetGraph   Picture.GetView   Workflow.GetOpenTasks   Tag.Suggestion   Task.GetAssigned   UserGroup.Suggestion   Document.GetRendition   Blob.PostToURL   Image.Blob.Resize   WebUI.InitSeamContext   JsonStack.ToggleDisplay   Actions.GET   GetRepositories   Document.Lock   Log   Audit.LogEvent   Auth.LoginAs   Auth.Logout   Document.Move   Document.PublishToSections   NRD-AC-PR-ChooseParticipants-Output   NRD-AC-PR-LockDocument   NRD-AC-PR-UnlockDocument   NRD-AC-PR-ValidateNode-Output   NRD-AC-PR-force-validate   NRD-AC-PR-storeTaskInfo   WebUI.NavigateTo   NuxeoDrive.SetActiveFactories   NuxeoDrive.AddToLocallyEditedCollection   NuxeoDrive.AttachBlob   NuxeoDrive.CanMove   NuxeoDrive.CreateFile   NuxeoDrive.CreateFolder   NuxeoDrive.CreateTestDocuments   NuxeoDrive.Delete   NuxeoDrive.FileSystemItemExists   NuxeoDrive.GenerateConflictedItemName   NuxeoDrive.GetRoots   NuxeoDrive.GetChangeSummary   NuxeoDrive.GetChildren   NuxeoDrive.GetClientUpdateInfo   NuxeoDrive.GetFileSystemItem   NuxeoDrive.GetTopLevelFolder   NuxeoDrive.GetTopLevelChildren   NuxeoDrive.Move   NuxeoDrive.SetSynchronization   NuxeoDrive.Rename   NuxeoDrive.SetVersioningOptions   NuxeoDrive.SetupIntegrationTests   NuxeoDrive.TearDownIntegrationTests   NuxeoDrive.UpdateFile   NuxeoDrive.WaitForElasticsearchCompletion   NuxeoDrive.WaitForAsyncCompletion   Repository.PageProvider   Context.PopDocument   Context.PopDocumentList   Context.PopBlob   Context.PopBlobList   Document.PublishToSection   Context.PullDocument   Context.PullDocumentList   Context.PullBlob   Context.PullBlobList   Context.PushDocument   Context.PushDocumentList   Context.PushBlob   Context.PushBlobList   WebUI.AddToClipboard   WebUI.PushDocumentToSeamContext   WebUI.AddToWorklist   LocalConfiguration.PutSimpleConfigurationParameters   LocalConfiguration.PutSimpleConfigurationParameter   Repository.Query   Audit.Query   Repository.ResultSetPageProvider   WebUI.RaiseSeamEvents   Blob.ReadMetadata   Context.SetMetadataFromBlob   Directory.ReadEntries   WebUI.Refresh   WebUI.Refresh   Document.RemoveACL   Services.RemoveDocumentTags   Document.RemoveEntryOfMultivaluedProperty   Blob.RemoveFromDocument   Document.RemovePermission   Document.RemoveProperty   Collection.RemoveFromCollection   Render.Document   Render.DocumentFeed   TemplateProcessor.Render   Document.ReplacePermission   Document.Reload   Picture.Resize   Context.RestoreDocumentInput   Context.RestoreDocumentsInput   Context.RestoreBlobInput   Context.RestoreBlobsInput   Document.RestoreVersion   Context.RestoreBlobInputFromScript   Context.RestoreBlobsInputFromScript   Context.RestoreDocumentInputFromScript   Context.RestoreDocumentsInputFromScript   Repository.ResultSetQuery   Document.Routing.Resume.Step   Workflow.ResumeNode   Counters.GET   RunOperation   RunDocumentOperation   Context.RunDocumentOperationInNewTx   RunFileOperation   RunOperationOnList   RunOperationOnProvider   RunOperationOnListInNewTx   RunInputScript   RunScript   WebUI.RunOperationInSeam   Document.Save   Seam.SaveDocumentInUI   Repository.SaveSession   SeamActions.GET   Document.Mail   Event.Fire   Document.AddACE   Context.SetVar   Context.SetInputAsVar   LocalConfiguration.SetSimpleConfigurationParameterAsVar   Document.Routing.SetRunningStepFromTask   Document.SetBlob   Document.SetBlobName   WebUI.SetJSFOutcome   Workflow.SetNodeVariable   Document.Routing.Step.Done   Document.Routing.BackToReady   Document.Routing.EvaluateCondition   Context.SetWorkflowVar   WebUI.ShowCreateForm   Document.CreateVersion   Context.StartWorkflow   Search.SuggestersLauncher   Services.TagDocument   Traces.Get   Traces.ToggleRecording   Document.SetMetadataFromBlob   Seam.GetChangeableDocument   Seam.FetchFromClipboard   Seam.GetCurrentDocument   Seam.GetCurrentDomain   Seam.GetCurrentWorkspace   Seam.FetchDocument   Seam.GetSelectedDocuments   Seam.GetDocumentsFromSelectionList   Seam.FetchFromWorklist   Document.Unlock   Document.UnblockPermissionInheritance   Services.UntagDocument   Document.Update   Document.SetProperty   Document.Routing.UpdateCommentsInfoOnDocument   Directory.UpdateEntries   Workflow.UserTaskPageProvider   VersionAndAttachFile   VersionAndAttachFiles   Blob.SetMetadataFromDocument   Blob.SetMetadataFromContext   Blob.CreateZip   acceptComment   addCurrentDocumentToWorklist   blobToPDF   cancelWorkflow   conditionalTask   decideNextStepAndSimpleValidate   downloadFilesZip   evaluateCondition   followLifeCycleTransition   followLifeCycleTransitionTask   initInitiatorComment   logInAudit   nextAssignee   notifyInitiatorEndOfWorkflow   publishDocument   publishTask   reinitAssigneeComment   rejectComment   Workflow.RemoveRoutingTask   sendTaskCreatedNotificationMail   setDone   setNextStep   setTaskDone   simpleChooseNextOption1AndDone   simpleChooseNextOption2AndDone   simpleRefuse   simpleTask   simpleUndo   simpleValidate   terminateWorkflow   undoRunningTask   updateCommentsOnDoc   validateDocument   voidChain   xmlExportRendition   zipTreeExportRendition
            
        

 

Favorite.GetDocuments
Blob.ToPDF
Image.Blob.Resize
Document.AddRelation
Workflow.CreateRoutingTask

lot of contributed operations

Principles

Commands as  REST resources

  • GET  to retrieve definition

  • POST  to execute

Get an Operation

GET /nuxeo/api/v1/automation/Document.PageProvider HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "id":"Document.PageProvider",
    "label":"PageProvider",
    "description":"Perform a query ...",
    "signature":[ "void", "documents" ],
    "params":[
      {  "name":"page",
         "type":"integer",
         "required":false
      },{
         "name":"query",
         "type":"string",
         "required":false, },
      ... ]
}

Get an Operation

Run an Operation

POST /nuxeo/api/v1/automation/Document.PageProvider HTTP/1.1
Content-Type: application/json+nxrequest
{ "params" :
    { "query" : "select * from Note",
      "page" : 0
    }
}
HTTP/1.1 200 OK
Content-Type: application/json
{
  "entity-type": "documents",
  "pageIndex": 0,
  "pageSize": 2,
  "pageCount": 2,
  "entries": [
    {
      "entity-type": "document",
      "repository": "default",
      "uid": "3f76a415-ad73-4522-9450-d12af25b7fb4",
      ...
    }, { ...}, ...
 ]
}

Resources & Automation

  • Share the marshaling layer and extension
    • ​Enrichers, Resolvers are available too
       
  • Compose Resources and Automation API
    • Pipe Resources as input for Automation Operation   
> cat /doc/path/somedoc | command(p3,p4)

Resources & Automation

RESOURCES > AUTOMATION

POST /nuxeo/api/v1/path/somePath/@op/Blob.ToPDF HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/pdf
...

More Composition

assemble API blocks without having to code 

build business API

Composable API : GOALS

  • Tailor the API to match application requirements
    • one API behind every action / button
       
  • Allow business analysts or UI developers to tailor the API
    • define what API is exposed
      • UI & Workflow needs

Automation Chain

  • Assemble operations in a chain
  • Pipe Output / Input
  • Give it a name
  • Call and execute within a
    single transaction    

Server side assembly

One Context

Assembling Chains

It does work !

  • Business users & Front end developers leverage this 
    • to expose custom API for their UI
    • to build custom logic inside their Workflows
    • to add automatic processing (listeners)
  • Actually it works almost too well
    • users do  awfully  complicated  things
    • chains calling chains calling chains ...

Scalability

Scale out Architecture

Scale Interactive Processing

Scale Batch Processing

Scale
Queries

Scale out Storage

Scale Storage

with NoSQL

HA / Multi-AZ

Geographical redundancy & disaster recovery

Mono-AZ deployment

(Manual) Multi-AZ deployment

AWS Multi-AZ deployment

NoSQL Multi-AZ deployment

Extreme Storage

Supercharging Nuxeo Repository

About Large Binary files

  • No impact on Repository performances 
    • Link (digest) to the Stream stored in Blob Manager
    • Blob upload is done out of transaction
  • Download
    • http streaming
    • CDN integration
  • Upload
    • chunking & resume on upload
  • Processing on large files
    can be off-loaded

About LARGE Binary Storage

  • Backend storage is pluggable
    • several implementations (FS, S3, GridFS, GDoc, DropBox ...)
    • ​​easy to implement
  • Can do partitions
  • Can do HSM   

Large Documents

  • Nuxeo Repository does handle arbitrary complex Schemas
    • but humans don't
  • CRUD operations
    • SQL: impact due to the number of columns / tables
      • can be tweaked with lazy-loading & prefetch 
    • NoSQL: no real impact on Read/Write, minor on update
  • Search operations
    • Elasticsearch can handle complex queries on complex documents
    • Can Scale-Out Elasticsearch as needed  

Massive number of Documents

  • CRUD operations
    • SQL: Lot of rows => Need Sharding​ 
    • NoSQL: MongoDB is happy with Billions of Documents 
  • Search operations
    • Can Scale-Out Elasticsearch as needed  

KEY Limitations of SQL - search

  • Complex SQL Queries
    • Configurable  Data Structure
    • User defined  multi-criteria searches
       
  • Scaling queries is complex
    • depend on indexes, I/O speed and available memory  
    • poor performances on unselective multi-criteria queries
       
  • Fulltext support is usually poor
    • limitations on features & impact on performances  
SELECT "hierarchy"."id" AS "_C1" FROM "hierarchy" 
   JOIN "fulltext" ON "fulltext"."id" = "hierarchy"."id" 
   LEFT JOIN "misc" "_F1" ON "hierarchy"."id" = "_F1"."id" 
   LEFT JOIN "dublincore" "_F2" ON "hierarchy"."id" = "_F2"."id" 
 WHERE 
  ("hierarchy"."primarytype" IN ('Video', 'Picture', 'File')) 
  AND ((TO_TSQUERY('english', 'sydney') 
      @@NX_TO_TSVECTOR("fulltext"."fulltext"))) 
  AND ("hierarchy"."isversion" IS NULL) 
  AND ("_F1"."lifecyclestate" <> 'deleted') 
  AND ("_F2"."created" IS NOT NULL )

ORDER BY "_F2"."created" DESC 

LIMIT 201 OFFSET 0; 

some types of queries  can  simply 
not be fast in SQL

KEY Limitations of SQL - CRUD

  • Impedance issue
    • storing Documents in tables is not easy
    • requires Caching and Lazy loading 
       
  • Scalability
    • Document repository and Audit Log can become very large (versions, workflows ...) 
    • ​scaling out SQL DB is complex (and never transparent)
       
  • Concurrency model
    • Heavy write is an issue (Quotas, Inheritance)
    • ​Hard to maintain good Read & Write performances​ 

USING NOSQL

  • No Impedance issue
    • One Nuxeo Doc = One MongoDB Document
    • No application level cache / no invalidations
       
  • No Scalability issue  for CRUD
    • ​native distributed architecture with scale out
       
  • No Concurrency performance issue 
    • ​Document Level "Transactions" 

 Repository & Audit Trail

  • Fast indexing

    • No ACID constraints / No impedance issue

    • Append only index
       

  • Super query performance

    • query on term using inverted index

    • very efficient caching  

    • native full text support

    • distributed architecture

Search & Audit Trail

Hybrid Storage Architecture

  • MongoDB

    • store structure & streams in a BASE way
       

  • elasticsearch

    • provide powerful and scalable queries
       

  • SQL DB

    • store structures in an  ACID way

Storage does not impact application : this can be a deployment choice!

A tomic C onsistent
I solated D urable

B asic  A vailability
S oft state  
E ventually consistent

depends on Availability & Performances requirements 

HUGE Repository WITH NOSQL

  • Massive amount of Documents
    • x00,000,000
       
    • Automatic versioning
      • create a version for each single change
  •  Write intensive access
    • daily imports or updates
    • recursive updates (quotas, inheritance)

SQL DB collapses (on commodity hardware)
MongoDB handles the volume

About MongoDB & Consistency

  • Atomic Document  Operations are safe
  • Large batch updates is not so much of an issue
     
  • Multi-documents transactions are an issue
    • ex: Workflows

Transactions can not span across multiple documents

  • Transient State Manager 
    • Run all operations in Memory
    • Populate an Undo Log
  • Recover Commit / Rollback model

Elasticsearch & Consistency

  • Async Indexing
    • ensure convergeance




       
  • Sync indexing :
    • see changes in listings in "real time"

Should I Worry about Jepsen resultS ?

  • Maintaining availability during network partition is hard 
    • Distributed system try to stay available
      • and sometimes  give access to staled data or loose updates
    • Standard ACID DB (SQLDB , MarkLogic) do not even try
      • so application become unavailable
         
  • Aphyr built a great testing system (Jepsen)  
    • testing opensource distributed system during network partitions
    • helps improving solutions & documentation
  • Currently all distributed systems break at some points
    • ​Cassendra, Riak, Redis, Aerospike, NuoDB, Elasticsearch, etcd, ...
    • MongoDB is probably just the most visible one (and has evolved since)
  • ​Even PGSQL or MySQL have issues in some cases 

Hybrid storage

Sample use case:
Press Agency
production system

mixed
requirements 

Security

Understanding Nuxeo Security

Security in the Repository

  • Security is always on
    • Java API / Http API / Search
       
  • ACL based default security policy
    • multiple & ordered ACLs
    • ACL inheritance or block
    • validity dates on ACE
       
  • additional pluggable security policy
    • implement custom security
      (ex: meta-data based)

Security CONTEXT

  • Security is evaluated in a Context
    • a Document
      • Security is placeful
    • a User 
      • attributes (or profiles)
      • groups
         
  • ACLs give or deny permission in a Context
    • Atomic permissions
    • Groups of permissions
    • Custom permissions

Permissions

  • Nuxeo defines a set of Atomic permissions

     

  • Nuxeo defines groups of permissions

     

  • Repository always checks the Atomic permissions
     

  • You can define custom permissions and groups of permissions

  • You can use Core API to check permissions explicitly
     

READ_PROPERTIES, ADD_CHILDREN, READ_LIFECYCLE ...
READ, WRITE, MANAGE ...
session.hasPermission(Document, Perm)

Security in UI & Services

(also available in Directories)

Security Granularity

  • Security is checked at Document Level

    • field/schemas do not hold ACLs

  • No field level security



     

  • Download action is checked by a custom Download Policy

    • depending on Document, File, XPath, User

  • Can view document meta-data without being able
    to download or preview

     

Security Granularity

  • Compound Documents

    • use ​nested documents  with different ACLs



       

    • Handle finer grained security
       

  • Custom API for custom visibility

    • Leverage Custom indexing in Elasticsearch

    • Custom marshaling layer in the Rest API

    • Expose data that would otherwise not be accessible

 

Handling Complex Security

  • ACL based

    • Computed Groups

      • i.e. compute groups based on user attributes

    • Automatically apply ACLs

      • i.e. Listeners and Automation

    • Complex to manage, test and update
       

  • Security Policies

    • Integrate custom logic at the core of the security system

    • Initially introduced for "military" use cases

    • Low administration + Good testability
       

Security Policy

  • Atomic permission check: Checkperm

    • Override or complement ACL based security

    • Java Based logic to Grant/Access based on

      • Document (including attributes)

      • User (including attributes)
         

  • Search security filtering: QueryTransformer

    • Avoid post-filtering in search

      • generate additional where clause

    • Allows custom security to scale with large queries

Directory Abstraction

Security Content vs Origin

Security Content vs Origin

Authentication

  • Nuxeo provides a pluggable Authentication system
    • Basic Auth, Form, Token, Kerberos,OAuth,
    • OpenId, CAS2, Shibboleth, SAML2, ...
    •  Keycloak, Okta, DuoWeb

Nuxeo Event Bus

Listeners & Queues

Nuxeo Event Bus

  • Nuxeo fires events for all actions
    • Document Create / Update / Delete
    • Workflow starts
    • User logged in
  • ​Events can be listened by 
    • ​By synchronous inline listeners
      • intercepts ongoing action
      • ​can block / rollback processing
    • ​By asynchronous listener
      • ​transaction events bundle
  • ​Scheduled asynchronous worker are persisted 

Nuxeo Event Bus

Example: ES Indexing

Audit Service

Custom Audit

  • Audit log is customizable
    • ​choice of events to track
    • what information to log with event
    • where to log

      the only limit is storage ! 
       
  • ​When using a NoSQL backend, it is easy to 
    • Store all changes for all Documents
    • Store per-transaction changes    

Current Event Bus

Target Event Bus (8.10+)

Target Gains

  • Redis / Kafka switch
    • ​High throughput resilient queuing
    • Better bulk import
       
  •  Custom Audit Queuing
    • More efficient Audit log
       
  • Plug Nuxeo Event Bus
    • ​Batch Notifications
    • ​Activities
    • Reactive UI 

Monitoring

Scale out Architecture

Monitoring Nuxeo

  • Integrate Coda Hale Yammer Metrics
  • Nuxeo exposes core metrics
    • repository activity
    • transactions
    • async jobs
    • elasticsearch
    • ...
  • Can deploy
    application specific metrics

     
  • Metrics are exposed via JMX
    • can be exposed via Http too   

Graphite Dashboard

Provide sample dashboard  for Graphite   

Nuxeo
Metrics

System
Metrics

DataDog Dashboard

Integrate with DataDog

Nuxeo & AWS

Deploying Nuxeo on AWS

Nuxeo LogicaL architecture

Leverage AWS Services

Leverage AWS Services

API driven provisioning and deployment

transparent fail-over

easy scalability

 

Deploy on AWS

  • AMI
    • Use Linux AMI + Debian Packages + Nuxeo Packages
    • Build custom AMI with Packer
       
  • CloudFormation
    • ​Template to provision and deploy Nuxeo
       
  • Ansible
    • Highly customized Nuxeo deployment
       
  • Terraform
    • ​Infrastructure as Code + independence from IaaS  

Leverage AWS Services

Multi-tenAncy

Isolated configuration in Nuxeo

Requirement

APPLICATION LEVEL MULTI-TENANTS 

Document Store 
Security
Life Cycle
Indexing
Versioning

all clients share the  same application

application manages data and configuration  partitionning

Application Level Multi-Tenants 

Shallow isolation

  • quota management is not efficient  
  • customization options are limited
     

Monolithic  

  • same version, same component set
  • same upgrade and maintenance policy
     

Not even simple

  • scale out is not that easy  (i.e. move a tenant)
  • per-tenant  Backup/Restore is not easy
  • Heterogeneous  deployment units
    ​          
    VM level / JVM level /  App level 

Can not leverage OSGi / Extension Point model
 

Not "Cloud Native approach"
 

Container Level Multi-tenants

rely on infrastructure to provide tenants isolation

application does not need to be impacted
 

Flexible
Unlimited Customization
Full Isolation & Quotas

Application Factory

Create "on demand" application for each customer

  • use Container level isolation
  • provision infrastructure from the Cloud  
  • custom assembly for each customer

Build  Your  Own  Application  

Deploy & Run !

nuxeo.io

  • nuxeo.io v1 
    • Build on a very young Docker ecosystem
    • Docker / CoreOS / Fleet
    • Lot of custom glue
       
  • ​nuxeo.io v2
    • ​Align on all the converging work on Docker
    • Focus on Nuxeo specific requirements
    • Docker / Swarm / Rancher  

Nuxeo.io v1

Nuxeo.io v2 - Rancher

Nuxeo.io with Rancher

Roadmap

What we are working on

http://roadmap.nuxeo.com/

From RoadMAP to Jira

  • Roadmap items are associated to Jira Epics Repository
    • Jira is public : https://jira.nuxeo.com/browse/NXP
    • You can track and comment issues
    • Issues are assigned to a dev and associated to a target release
       
  • GitHub is an other way to track our progress
    • All Commits are associated with Jira Issues
    • All main features are available in separated branches

Comments, PR and votes are welcome !

Any Questions ?

Thank You !

Architecture Overview

By Thierry Delprat

Architecture Overview

  • 5,666