Calypr: Computation Analysis and Learning Yielding Predictions and Reports
Challenge: Data Silos + Duplications + Interoperability
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
. . .
lab A
lab Z
Data Lake
UI interface
Search API UI
Clark et al. AI-readiness for Biomedical Data Bridge2AI Recommendations
Management
ETL
Standards
Data Science Garage: Data Standards + ETL + management
Calypr : Extending Gen3 with Microservices for Bioinformatics Data Storage, Querying & Management
Modular microservises
BMEG Iceberg + FHIR graph schema
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Gene
Allele
Protein
Transcript
Pathway
Calypr: Data Ingestion + Integration
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Calypr: Landing and Main App Page
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Calypr: Discovery and Summary Reports page
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Calypr: File Viewer App
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Calypr: Data Explorer - Cohort Builder
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Calypr: Data Explorer - Temporal Data Tracking
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Calypr: Image Viewer
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Calypr: Connecting Applications
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Nasim Sanati
Computational Biologist
@teslajoy
Quinn Wai Wong
Research Software Eng.
@quinnwai
Kyle Ellrott
PI + Associate Professor
@kellrott
Liam Beckman
Research Software Eng.
@lbeckman314
Brian Walsh
Sr. Research Software Eng.
@bwalsh
Jordan Tagle
Computational Biologist
@jordan2lee
Matthew Peterkort
Research Software Eng.
@matthewpeterkort
Allison Creason
Sr. Computational Biologist
Jay Egger
Computational Biologist
HTAN: Bidirectional Transformer between FHIR and HTAN schema.
GRIP & FHIR
Querying TCGA BRCA Data with subtype labels for classification
Subtype Classification with Neural Networks
ML modeling and Shapley Feature Extraction
Pathway Enrichment Analysis
Extracting Biological Insights from Gene Set Results
Where We Work
Knight Cancer Research Building @ South Waterfront
Across from OMSI and right by Tilikum Crossing
Scientific Problem
How can we find actionable insights in a disease as complex as cancer?
Scientific Problem
How do we integrate data to extract meaningful insights?
ETL (Extract, Transform, Load)
FHIRizer – Transforms GDC, Cellosaurus, HTAN data into FHIR graph representation.
Snakemake & Airflow – Automate and orchestrate ETL workflows.
Bringing It All Together with Python
Data Standards, ETL, and Management
GRaph Integration Platform (GRIP) - a powerful graph query engine designed to span across multiple database backends.
Gen3 – Data commons for storage, querying, and management.
Standards
Pydantic – Validates and enforces FHIR schema.
GA4GH (Global Alliance for Genomics & Health) – Establishes global standards for genomic data interoperability.
Pathway Databases (Pathway Commons, Reactome) – Structure & standardize biological context through molecular interaction graphs.
Data Science Garage team
Bringing it all together with data standards + extract transform load (ETL) + management
Management
ETL
Standards
Calypr: Data Explorer - Cohort Builder
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.