CALYPR: Computation Analysis and Learning Yielding Predictions and Reports
Challenge: Data Silos + Duplications + Interoperability
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.
. . .
lab A
lab Z
Data Lake
UI interface
Clark et al. AI-readiness for Biomedical Data Bridge2AI Recommendations
CALYPR: Originally launched as International Data Exchange and Analysis Platform (ACED‑IDP )
A multi‑year, multi‑institution & cross‑department collaboration
Management
ETL
Standards
Data Science Garage: Data Standards + ETL + management
CALYPR : Extending Gen3 with Microservices for Bioinformatics Data Storage, Querying & Management
Modular Microservices
BMEG Iceberg + FHIR Graph Schema - A FAIR framework for capturing study design questions:
Which specimens in a pancreatic cancer cohort have an EGFR mutation status?
Gene
Allele
Protein
Transcript
Pathway
CALYPR: Data Ingestion + Integration
CALYPR: Landing & Main App Page
CALYPR: Discovery & Summary Reports
CALYPR: File Viewer
CALYPR: Data Explorer - Cohort Builder
CALYPR: Data Explorer - Temporal Data Tracking
CALYPR: Image Viewer
CALYPR: Connecting Applications - Future Direction
Nasim Sanati
Computational Biologist
@teslajoy
Quinn Wai Wong
Research Software Eng.
@quinnwai
Kyle Ellrott
PI + Associate Professor
@kellrott
Liam Beckman
Research Software Eng.
@lbeckman314
Brian Walsh
Sr. Research Software Eng.
@bwalsh
Jordan Tagle
Computational Biologist
@jordan2lee
Matthew Peterkort
Research Software Eng.
@matthewpeterkort
Allison Creason
Sr. Computational Biologist
Jay Egger
Computational Biologist
Luke Sargent
Sr. Research Software Eng.
HTAN: Bidirectional Transformer between FHIR and HTAN schema.
GRIP & FHIR
Querying TCGA BRCA Data with subtype labels for classification
Subtype Classification with Neural Networks
ML modeling and Shapley Feature Extraction
Pathway Enrichment Analysis
Extracting Biological Insights from Gene Set Results
Where We Work
Knight Cancer Research Building @ South Waterfront
Across from OMSI and right by Tilikum Crossing
Scientific Problem
How can we find actionable insights in a disease as complex as cancer?
Scientific Problem
How do we integrate data to extract meaningful insights?
ETL (Extract, Transform, Load)
FHIRizer – Transforms GDC, Cellosaurus, HTAN data into FHIR graph representation.
Snakemake & Airflow – Automate and orchestrate ETL workflows.
Bringing It All Together with Python
Data Standards, ETL, and Management
GRaph Integration Platform (GRIP) - a powerful graph query engine designed to span across multiple database backends.
Gen3 – Data commons for storage, querying, and management.
Standards
Pydantic – Validates and enforces FHIR schema.
GA4GH (Global Alliance for Genomics & Health) – Establishes global standards for genomic data interoperability.
Pathway Databases (Pathway Commons, Reactome) – Structure & standardize biological context through molecular interaction graphs.
Data Science Garage team
Bringing it all together with data standards + extract transform load (ETL) + management
Management
ETL
Standards
Calypr: Data Explorer - Cohort Builder
Data can become fragmented locked in silos, duplicated unnecessarily, and difficult to integrate.