OBiBa Stack
OBiBa
Open source software for BioBank
- Mission: software solutions for epidemiological data management and analysis
- IT team of Maelstrom Research
The software stack
Onyx
Opal
Mica
DataSHIELD
Publication
Storage, curation
Collection
Analysis
GPL3 license
Agate
Central Authentication
History and supporting projects
2008
2009
2010
2011
2012
2013
2014
2015
Onyx
Opal
DataSHIELD
Mica
CPTP
CPTP
CLSA
CLSA
BioSHaRE
Maelstrom, BioSHaRE
IALSA, CPTP
Maelstrom, CPTP
2016
Onyx
Electronic data capture to improve data quality
- Participant interview
- Data collection from
- questionnaires
- instruments (images, data points...)
- electronic consent
- sample collections
- Data export to files or Opal server
Onyx architecture
Service (Spring)
Persistence (Hibernate)
SQL
Web UI (Wicket)
JWS
Security (Shiro)
Browser
Clients
Server
JWS
Onyx security
- Shiro realms
- Onyx user directory
- ...
- Role based permissions
- administrator
- participant manager
- interviewer
Onyx deployment
- Client
- Any OS
- Browser
- Java (if data extraction from instrument)
- Server
- Linux (recommended)
- Web application container (Tomcat, Jetty)
- Java7
- MySQL (others not tested)
Opal
Central data repository for data curation, analysis, and harmonization
- Data import from files, Opal or Limesurvey
- Data storage in SQL and/or MongoDB
- Data transformation, curation and harmonization using derived variables
- Data export
- Participant identifiers management
Opal architecture
Service (Spring)
Persistence
SQL
Web UI (GWT)
Web Services (REST)
MongoDB
...
Web Server (Jetty)
R/DataSHIELD
Search (elasticsearch)
Security (Shiro)
Fs
R (server)
Browser
R
Python
Clients
Server
Other
...
Opal security
- Shiro realms
- Opal user directory
- ...
- Access control list (ACL) over REST resources
- View dictionary and summaries
- Edit dictionary and view summaries
- View dictionary and values
- Administrate
- Datashield
- ...
Opal deployment
- Client
- Any OS
- Browser
- Server
- Linux (preferred): debian, redhat or zip package
- Java8
- MySQL, PostgreSQL and/or MongoDB
- R server (optional)
DataSHIELD
Distributed statistical analysis without having access to individual data
- Implemented using R and Opal
- dataset access control
- dataset pushed from the database to R
- limited R commands
- R packages are developed by Paul Burton's team with OBiBa technical support
DataSHIELD architecture
Opal 1
R (client)
...
Opal n
R (server) 1
R (server) n
Clients
Servers
DataSHIELD security
- Authentication by Opal or Agate
- Permissions by Opal
- Required permissions
- Dataset: View dictionary and summaries
- Operation: Datashield
- Opal verifies that each R command is permitted
DataSHIELD deployment
- Client
- Any OS
- R
- DataSHIELD R packages
- Server
- Opal
- R server
- DataSHIELD R packages
Mica
Study and dataset publication
- Study consortium data portal
- Network
- Study
- Dataset
- Variable
- Data access requests
- Search engine
- Data summaries and queries
Mica architecture
Service (Spring)
Persistence
Web UI (angularjs)
Web Services (REST)
MongoDB
Web Server (Jetty)
Search (elasticsearch)
Security (Shiro)
Opal (server)
Opal (client)
Browser
CMS (Drupal)
Python
Clients
Server
Other
Git
Mica security
- Shiro realms
- ACLs based permissions
Mica deployment
- Client
- Any OS
- Browser
- Server
- Drupal
- Linux (preferred): debian, redhat or zip package
- Java8
- MongoDB
- Opal
- Agate
Agate
Central authentication service
- ID provider service
- authentication
- user profile
- sign up, forgot/reset password
- OAuth2, OpenID Connect
- single sign-on
- Email notifications service
- template based
Agate architecture
Service (Spring)
Persistence
Web UI (angularjs)
Web Services (REST)
MongoDB
Web Server (Jetty)
Security (Shiro)
Browser
CMS (Drupal)
Python
Clients
Server
Other
Templates
Agate security
- Shiro realms
- Role based permissions
- Standards
- OAuth2
- OpenID Connect
- Json Web Token
Agate deployment
- Client
- Any OS
- Browser
- Server
- Linux (preferred): debian, redhat or zip package
- Java8
- MongoDB
User stories
- CPTP, data collection
- CLSA, data collection
- CLSA, data storage and curation
- BioSHaRE, data publication and analysis
- CPTP, data publication
- Maelstrom, meta-data publication
CPTP, data collection
- 5 cohorts
- For each cohort
- 1 Opal
- 1 Onyx per data collection site (DCS)
- No Internet
CPTP, data collection
Onyx
DCS 1
Onyx
DCS 2
Opal
Encrypted files
Other data sources
CLSA, data collection
- Limesurvey for phone interviews
- Onyx for in-home interviews
- Onyx in data collection sites
- 1 Opal for storage
CLSA, data collection
Some in-home answers are used in DCS
Limesurvey
Onyx in-home 1
Opal
(McMaster University)
Other data sources
Onyx DCS 1
1
2
3
CLSA, storage and curation
- Central Opal server at McMaster University
- Analysis Opal server at McGill University
- Nightly imports from McMaster to McGill
- R server at McGill
CLSA, storage and curation
30K participants, 8K variables , +50 datasets, ~10T of data
Opal
(McMaster University)
Opal
(McGill University)
R
Raid5
BioSHaRE, publication and analysis
- BioSHaRE portal: bioshare.eu
- 8 Opal+R servers hosted and managed by European universities
- 14 cohorts
- 2 harmonized datasets:
- HOP: 10 cohorts, ~200K participants
- ECP: 5 cohorts, ~750K participants
BioSHaRE, publication
Mica
Amazon
Opal (Groningen University)
Opal (Imperial College)
...
BioSHaRE, DataSHIELD analysis
RStudio
Amazon
Opal + R (Groningen University)
Opal + R (Imperial College)
...
CPTP, publication
- CPTP portal: portal.partnershipfortomorrow.ca
- 2 Opal servers hosted in On and Qc
- 5 cohorts
- 1 harmonized dataset:
- CoreQx: 5 cohorts, ~200K participants
CPTP, publication
Agate
OICR
Opal
(On, Ab, Bc, Atl)
Opal (Qc)
Drupal
MySQL
MongoDB
OICR
CARTaGENE
OICR
Mica
Maelstrom, publication
- Maelstrom: maelstrom-research.org
- Opal, Mica, Agate servers hosted at OICR
- 8 networks
- 130 studies
- 350 datasets
- 250K variables
Stack scalability
- Horizontal
- Vertical
Stack scalability (future)
- R/DataSHIELD
- Opal: load balancing on multiple R servers
- Opal clustering
Stack deployment
Stack administration
- Command line tools (Python) for administration tasks automation
- Built-in monitoring web services (memory, threads...)
- Logs
- general purpose logs
- REST activity
- DataSHIELD activity
OBiBa resources
- Source code: github.com/obiba
- Docker repository: registry.hub.docker.com/repos/obiba
- Documentation: wiki.obiba.org
- Issue tracking: jira.obiba.org
- Mailing list: obiba-users
Obiba Stack
By Yannick Marcon
Obiba Stack
Obiba softwares stack: architecture, deployment, administration, use cases.
- 3,377