Calypr: Computation Analysis and Learning Yielding Predictions and Reports

Challenge: Data Silos + Duplications + Interoperability
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.

. . .
lab A
lab Z

Data Lake
UI interface
Search API UI





Clark et al. AI-readiness for Biomedical Data Bridge2AI Recommendations

Management
ETL
Standards




Data Science Garage: Data Standards + ETL + management
Calypr : Extending Gen3 with Microservices for Bioinformatics Data Storage, Querying & Management





Modular microservises
- Gen3 User: Build cohort and download
- Gen3 Developer: build + integrate micorservises
- Gen3 operator: Spinup an instance + configure data + costumize frontend

BMEG Iceberg + FHIR graph schema
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.


Gene
Allele
Protein
Transcript
Pathway


Calypr: Data Ingestion + Integration
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.








Calypr: Landing and Main App Page
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.


- Data Explorer
- Storage Manager
- Custom Report Pages
- Data Viewer
- Interactive Computation
- Graph Queries
- Project Management
Calypr: Discovery and Summary Reports page
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.


Calypr: File Viewer App
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.


Calypr: Data Explorer - Cohort Builder
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.




Calypr: Data Explorer - Temporal Data Tracking
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.

Calypr: Image Viewer
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.


Calypr: Connecting Applications
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.

Nasim Sanati
Computational Biologist
@teslajoy
Quinn Wai Wong
Research Software Eng.
@quinnwai
Kyle Ellrott
PI + Associate Professor
@kellrott
Liam Beckman
Research Software Eng.
@lbeckman314


Brian Walsh
Sr. Research Software Eng.
@bwalsh

Jordan Tagle
Computational Biologist
@jordan2lee
Matthew Peterkort
Research Software Eng.
@matthewpeterkort

Data Science Garage Team + Creason's Team



Allison Creason
Sr. Computational Biologist
Jay Egger
Computational Biologist

Code + Stunning Sunrise + Local Coffee + Great Team

Extra Slides
HTAN: Bidirectional Transformer between FHIR and HTAN schema.


GRIP & FHIR
Querying TCGA BRCA Data with subtype labels for classification
Subtype Classification with Neural Networks
ML modeling and Shapley Feature Extraction
Pathway Enrichment Analysis
Extracting Biological Insights from Gene Set Results

Where We Work
Knight Cancer Research Building @ South Waterfront


Across from OMSI and right by Tilikum Crossing



Scientific Problem
How can we find actionable insights in a disease as complex as cancer?

Scientific Problem
How do we integrate data to extract meaningful insights?
-
ETL (Extract, Transform, Load)
-
FHIRizer – Transforms GDC, Cellosaurus, HTAN data into FHIR graph representation.
-
Snakemake & Airflow – Automate and orchestrate ETL workflows.
-
Bringing It All Together with Python
Data Standards, ETL, and Management
-
Management
GRaph Integration Platform (GRIP) - a powerful graph query engine designed to span across multiple database backends.
Gen3 – Data commons for storage, querying, and management.
-
Standards
-
Pydantic – Validates and enforces FHIR schema.
-
GA4GH (Global Alliance for Genomics & Health) – Establishes global standards for genomic data interoperability.
-
Pathway Databases (Pathway Commons, Reactome) – Structure & standardize biological context through molecular interaction graphs.
-




Data Science Garage team
Bringing it all together with data standards + extract transform load (ETL) + management
Management
ETL
Standards



Calypr: Data Explorer - Cohort Builder
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
Copy of Calypr_overview
By teslajoy
Copy of Calypr_overview
- 78




