Looking Beyond Risk Factors:
Generative Bio-AI for Proactive Point-of-Care Early Diagnosis and Reduced Screen Failures in ILD and PF
Ishanu Chattopadhyay, PhD
Assistant Professor of Medicine
University of Chicago
ishanu@uchicago.edu
ishanu@uchicago.edu
ishanu@paraknowledge.ai
University of Chicago Medicine
The Laboratory for Zero Knowledge Discovery
mathematics
computer science
social science
medicine
D3M (I2O)
PAI (DSO)
PREEMPT (BTO)
YFA (DSO)
FUNDING
Prognosis at Point-of-Diagnosis
Patient Journey
Early Diagnosis
Reduce screen failure rates
Holistic health surveillance
Predict antifibrotics continuation
improve outcomes
1
2
3
Interstitial Lung Disease / Pulmonary Fibrosis
Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records
Flag patients before they (or doctors) suspect
Primary Care
Pulmonologist
Zero-burden Co-morbid Risk Score (ZCoR)
Referral
shortness of breath
dry cough
doctor can hear velcro crackles
Non-specific Symptoms
>50 years old
more men than women
IPF
Rare disease
~5 in 10,000
Post-Dx
Survival
~4 years
Cannot always be seen on CXR
At least one misdiagnosis
~55%
Two or more misdiagnosis
38%
Initially attributed to age related symptoms:
72%
PCP workflow demands
Known Co-morbidities of PF
Are there more? Subtle footprints in the medical history that are more hetergenous?
~ 4yrs
current survival ~4yrs
~ 4yrs
current clinical DX
ZCoR screening
Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y
n=~3M
AUC~90%
Likelihood ratio ~30
Conventional AI/ML attempts to model the physician
AI in IPF Research
ICD administrative codes
IPF
ILD
target codes appear
Past medical history
No target codes appear
case
control
2yrs
2yrs
prediction
target codes appear
Past medical history
No target codes appear
case
control
2yrs
2yrs
IPF drugs prescribed
Signature of IPF diagnostic sequence
pirfenidone or nintedanib
ICD Codes can be noisy
"cases" are not always true IPF
Truven MarketScan (IBM) Commerical Claims & Encounters Database 2003-2018
>100M patients visible
>7B individual claims
>87K unique diagnostic codes
>7% Medicare data present
2,053,277 patients included in study
University of Chicago Medical Center 2012-2021
68,658 patients
Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic
861,280 patients
2,983,215 patients
Data: Onishchenko etal. Nat. Medicine 2022
very likelihood ratios achieved irrespective of subgroup
performance tables
Out-of-sample Results
specificity ~99%
NPV >99.9%
IPF
ILD
Comorbidity Spectra
patient A
patient B
patient C
Beyond "risk factors" to personalized risk patterns
False Positives:
Ethics:
For every 20-30 flags,
1 is positive
minimal
acceptable?
Better outcomes
Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.
Clinical Trial Cohort Selection
Current screen failure rate ~50-60%
ZCoR boosted screen failure rate ~20%
Cloud Deployment
Theoretical formulation
Multi-cohort validation
Launch User-Accessible Platform
3 years
2 years
[
{
"patient_id": "P000038",
"sex": "F",
"birth_date": "01-01-2006",
"DX_record": [
{"date": "07-31-2006", "code": "Z38.00"},
{"date": "08-07-2006", "code": "P59.9"},
{"date": "08-29-2016", "code": "J01.90"},
{"date": "09-10-2016", "code": "J01.90"},
{"date": "11-14-2016", "code": "J01.91"}
],
"RX_record": [
{"date": "10-29-2011", "code": "rxLDA017"},
{"date": "05-16-2015", "code": "rxIDG004"},
{"date": "08-08-2015", "code": "rxIDG004"},
{"date": "06-04-2016", "code": "rxIDD013"}
],
"PROC_record": [
{"date": "02-05-2007", "code": "90723"},
{"date": "11-05-2007", "code": "J1100"}
]
}
]
{
"predictions": [
{
"error_code": "",
"patient_id": "P000012",
"predicted_risk": 0.005794344620009157,
"probability": 0.8253881317184486
}
],
"target": "TARGET"
}
Data In
Data Out
The Paraknowledge API
curl -X POST -H "Content-Type: application/json" -d '[{"patient_id": "P28109965201", "sex": "M", "age": 89, "fips": "35644", "DX_record": [{"date": "12-16-2011", "code": "R09.02"}, {"date": "12-30-2011", "code": "H04.129"}, {"date": "12-30-2011", "code": "H02.109"}], "RX_record": [], "PROC_record": [{"date": "09-28-2012", "code": "71100"}]}]' "https://us-central1-pkcsaas-01.cloudfunctions.net/zcor_predict?target=IPF&api_key=7eea9f70d79c408f2b69847d911303c"
Current Targets
IPF
ILD
ADRD
CKD
CKD_SEVERE
MELANOMA
CANCER_PANCREAS
CANCER_UTERUS
SISA
Cohort Selection and Risk Analysis Testbed
Upto 4 year "signal" resolution
decreases risk
increases risk
Patient Journey: Tracking Risk over time
Risk decreases sometimes
new codes change trajectory as they are revealed
Off-the-shelf AI does not suffice
Modeling Longitudinal Patterns
Specialized HMM models from code sequences
Model control and case cohorts seprately
given a new test case, compute likelihood of sample arising from case models vs control models
sequence likelihood defect
Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.
ZeD Lab: Predictive Screening from Comorbidity Footprints
Nature Medicine
JAHA
CELL Reports
Science Adv.
The ZCoR Approch: Rapidly Re-targettable
ZED performance | Competition | |
---|---|---|
Autism | >80% AUC at 2 yrs | "obvious" |
Alzheimer's Disease | ~90% AUC | 60-70% AUC |
Idiopathic Pulmonary Fibrosis | ~90% AUC | NA |
MACE | ~80% AUC | ~70% AUC |
Bipolar Disorder | ~85% AUC | NA |
CKD | ~85% AUC | NA |
Cancers (Prostate, Bladder, Uterus, Skin) | ~75-80% AUC | Low |
Deploy all/many/most of these!
Predictions at the Point-of-Diagnosis
Can my patient continue taking anti-fibrotics over long term?
Digital Twins for Health trajectories
}
1M parameters
1M parameters
Predicts disorders across the disease specturm
Pre-empting Effectiveness of Antifibrotics at the point of diagnosis
~78% AUC
26-32 out of 100 discontinued
4-5 out of 100 discontinued
Prognosis at Point-of-Diagnosis
Patient Journey
Early Diagnosis
Reduce screen failure rates
Holistic health surveillance
Predict antifibrotics continuation
improve outcomes
Summary
3
2
1
ishanu@uchicago.edu
@ishanu_ch
ishanu@paraknowledge.ai
Take Home Message,
Conclusions
Q-Net
recursive forest
This is a general method!
Data
\(\downarrow \)
Set of interdependent
predictors
q-distance
a biologically informed, adaptive distance between strains
Smaller distances imply a quantitatively high probability of spontaneous jump
$$J \textrm{ is the Jensen-Shannon divergence }$$
Metric Structure
Tangent Bundle
geometry
dynamics
Is AI/ML adding anything of relevance?
"predicting" autism > 3yrs
"diagnosing" fibrosis from lung imaging
"diagnosing" dementia from brain scan
State of Art for Universal Screening
M54_72
M54_60
E78_72
1
5
X