OBiBa Stack

obiba.org

OBiBa

Open source software for BioBank

  • Mission: software solutions for epidemiological data management and analysis
  • IT team of Maelstrom Research

The software stack

Onyx

Opal

Mica

DataSHIELD

Publication​
Storage, curation
Collection
Analysis

GPL3 license

Agate

Central Authentication

History and supporting projects

2008

2009

2010

2011

2012

2013

2014

2015

Onyx 

Opal 

DataSHIELD 

Mica 

CPTP

CPTP

CLSA

CLSA

BioSHaRE

Maelstrom, BioSHaRE

IALSA, CPTP

Maelstrom, CPTP

2016

Onyx

Electronic data capture to improve data quality

  • Participant interview
  • Data collection from
    • questionnaires
    • instruments (images, data points...)
    • electronic consent
    • sample collections
  • Data export to files or Opal server

Onyx architecture

Service (Spring)

Persistence (Hibernate)

SQL

Web UI (Wicket)

JWS

Security (Shiro)


Browser

Clients
Server

JWS

Onyx security

  • Shiro realms
    • Onyx user directory
    • ...
  • Role based permissions
    • administrator
    • participant manager
    • interviewer

Onyx deployment

  • Client
    • Any OS
    • Browser
    • Java (if data extraction from instrument)
  • Server
    • Linux (recommended)
    • Web application container (Tomcat, Jetty)
    • Java7
    • MySQL (others not tested)

Opal

Central data repository for data curation, analysis, and harmonization

  • Data import from files, Opal or Limesurvey
  • Data storage in SQL and/or MongoDB
  • Data transformation, curation and harmonization using derived variables
  • Data export
  • Participant identifiers management

Opal architecture

Service (Spring)

Persistence

SQL

Web UI (GWT)

Web Services (REST)

MongoDB

...

Web Server (Jetty)

R/DataSHIELD

Search (elasticsearch)

Security (Shiro)

Fs

R (server)


Browser

R

Python

Clients
Server
Other

...

Opal security

  • Shiro realms
    • Opal user directory
    • ...
  • Access control list (ACL) over REST resources
    • View dictionary and summaries
    • Edit dictionary and view summaries
    • View dictionary and values
    • Administrate
    • Datashield
    • ...

Opal deployment

  • Client
    • Any OS
    • Browser
  • Server
    • Linux (preferred): debian, redhat or zip package
    • Java8
    • MySQL, PostgreSQL and/or MongoDB
    • R server (optional)

DataSHIELD

Distributed statistical analysis without having access to individual data

  • Implemented using R and Opal
    • dataset access control
    • dataset pushed from the database to R
    • limited R commands
  • R packages are developed by Paul Burton's team with OBiBa technical support

DataSHIELD architecture

Opal 1

R (client)

...

Opal n

R (server) 1

R (server) n


Clients
Servers

DataSHIELD security

  • Authentication by Opal or Agate
  • Permissions by Opal
  • Required permissions
    • Dataset: View dictionary and summaries
    • Operation: Datashield
  • Opal verifies that each R command is permitted

DataSHIELD deployment

  • Client
    • Any OS
    • R
    • DataSHIELD R packages
  • Server
    • Opal
    • R server
    • DataSHIELD R packages

Mica

Study and dataset publication

  • Study consortium data portal
    • Network
    • Study
    • Dataset
    • Variable
    • Data access requests
  • Search engine
  • Data summaries and queries

Mica architecture

Service (Spring)

Persistence

Web UI (angularjs)

Web Services (REST)

MongoDB

Web Server (Jetty)

Search (elasticsearch)

Security (Shiro)

Opal (server)

Opal (client)


Browser

CMS (Drupal)

Python

Clients
Server
Other

Git

Mica security

  • Shiro realms
  • ACLs based permissions

Mica deployment

  • Client
    • Any OS
    • Browser
  • Server
    • Drupal
    • Linux (preferred): debian, redhat or zip package
    • Java8
    • MongoDB
    • Opal
    • Agate

Agate

Central authentication service

  • ID provider service
    • authentication
    • user profile
    • sign up, forgot/reset password
    • OAuth2, OpenID Connect
    • single sign-on
  • Email notifications service
    • template based

Agate architecture

Service (Spring)

Persistence

Web UI (angularjs)

Web Services (REST)

MongoDB

Web Server (Jetty)

Security (Shiro)


Browser

CMS (Drupal)

Python

Clients
Server
Other

Templates

Agate security

  • Shiro realms
  • Role based permissions
  • Standards
    • OAuth2
    • OpenID Connect
    • Json Web Token

Agate deployment

  • Client
    • Any OS
    • Browser
  • Server
    • Linux (preferred): debian, redhat or zip package
    • Java8
    • MongoDB

User stories

  • CPTP, data collection
  • CLSA, data collection
  • CLSA, data storage and curation
  • BioSHaRE, data publication and analysis
  • CPTP, data publication
  • Maelstrom, meta-data publication

CPTP, data collection

  • 5 cohorts
  • For each cohort
    • 1 Opal
    • 1 Onyx per data collection site (DCS)
    • No Internet

CPTP, data collection

Onyx

DCS 1

Onyx

DCS 2

Opal

Encrypted files

Other data sources

CLSA, data collection

  • Limesurvey for phone interviews
  • Onyx for in-home interviews
  • Onyx in data collection sites
  • 1 Opal for storage

CLSA, data collection

Some in-home answers are used in DCS

Limesurvey

Onyx in-home 1

Opal

(McMaster University)

Other data sources

Onyx DCS 1

1

2

3

CLSA, storage and curation

  • Central Opal server at McMaster University
  • Analysis Opal server at McGill University
  • Nightly imports from McMaster to McGill
  • R server at McGill

CLSA, storage and curation

30K participants, 8K variables , +50 datasets, ~10T of data

Opal

(McMaster University)

Opal

(McGill University)

R

Raid5

BioSHaRE, publication and analysis

  • BioSHaRE portal: bioshare.eu
  • 8 Opal+R servers hosted and managed by European universities
  • 14 cohorts
  • 2 harmonized datasets:
    • HOP: 10 cohorts, ~200K participants
    • ECP: 5 cohorts, ~750K participants

BioSHaRE, publication

Mica

Amazon

Opal (Groningen University)

Opal (Imperial College)

...

BioSHaRE, DataSHIELD analysis

RStudio

Amazon

Opal + R (Groningen University)

Opal + R (Imperial College)

...

CPTP, publication

CPTP, publication

Agate

OICR

Opal

(On, Ab, Bc, Atl)

Opal (Qc)

Drupal

MySQL

MongoDB

OICR

CARTaGENE

OICR

Mica

Maelstrom, publication

  • Maelstrom: maelstrom-research.org
  • Opal, Mica, Agate servers hosted at OICR
    • 8 networks
    • 130 studies
    • 350 datasets
    • 250K variables

Stack scalability

Stack scalability (future)

  • R/DataSHIELD
    • Opal: load balancing on multiple R servers
  • Opal clustering

Stack deployment

  • Linux (recommended)
    • debian packages
    • rpm packages
    • zip packages
  • Images

Stack administration

  • Command line tools (Python) for administration tasks automation
  • Built-in monitoring web services (memory, threads...)
  • Logs
    • general purpose logs
    • REST activity
    • DataSHIELD activity

OBiBa resources