AI and Machine Learning in Medicine:
Screening for Complex Diseases
Rapidly, Cheaply, and Early
Ishanu Chattopadhyay, PhD
Assistant Professor of Medicine
University of Chicago
ishanu@uchicago.edu
02.28.2024
Learning Objectives
What is AI/Machine Learning? What are the key application in the context of medicine? What does it bring to the table in the context of Health Services and Bio-medicine? Are there new questions that we can answer? Does it suffice to draw on off-the-shelf models? What are the new/emerging ideas?
Application of AI in Biomedicine: Why We Need a “Bio”-AI.
Emerging tools for addressing Late and Missed Diagnosis in Primary Care
Why “risk factors” are often not predictive enough, and how to think about more personalized predictors of future risk of serious diseases
Universal Screening?
Zero-burden EHR Analytics
Diagnostic & Screening for complex disorders
*CoR : * Comorbid Risk Scores
ACoR (Autism)
PCoR (IPF/ILD)
ZCoR (ADRD/AD)
ZCoR-C (cancers with further specialization)
Leverage Vast Patient EHR and Insurance Claims Database(s)
Truven MarketScan (IBM) Commerical Claims & Encounters Database 2003-2018
87M patients visible > 1 year
>7B individual claims
>87K unique diagnostic codes
>7% Medicare data present
Why are ML/AI models complicated, and non-transparent?
individual data points not so much important
Tyco Brahe
(1546-1601)
Johannes Keplar (1571-1630)
Newtonian theory of Universal Gravitation (1684)
raw data
empirical fit
universal law of physics
30,000 experiments
Starting point of modern genetics
Mendel's Laws of Genetics
Johann Gregor Mendel (1822–1884)
Some datasets are large, but simple: easily compressible or representable
Others, are not.
"big data" has irreducible complexity
Hence, "models" must have capacity to accommodate this complexity
Machine Learning and AI allows us to find "theories" which are no longer specifiable as simple equations,
but require
billions of parameters to specify
Medical history
co-morbidities
lifestyle
genetics
environment
Estimate disease risk
Estimate prognosis
Reduce missed and delayed diagnosis
Find prodromal patients for clinical trials
The Age of Data
Autism Spectrum Disorder + AI
Idiopathic Pulmonary Fibrosis + AI
Literature Search: AI + Target Disease
Current AI Applications are limited in practice
Are ML predictions pertaining to clinical diagnoses adding anything of relevance?
Risk
The Key Stumbling Block: Features
How to find good features?
Good features
relevant risk factors
Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records
Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records
Flag patients before they (or doctors) suspect
Primary Care
Pulmonologist
?
Zero-burden Co-morbid Risk Score (ZCoR)
shortness of breath
dry cough
doctor can hear velcro crackles
Common Symptoms
>50 years old
more men than women
IPF
Rare disease
~5 in 10,000
Post-Dx
Survival
~4 years
At least one misdiagnosis
~55%
Two or more misdiagnosis
38%
Initially attributed to age related symptoms:
72%
Cannot always be seen on CXR
Non-specific symptoms
PCP workflow demands
~ 4yrs
current survival ~4yrs
~ 4yrs
current clinical DX
ZCoR screening
Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y
n=~3M
AUC~90%
Likelihood ratio ~30
Conventional AI/ML attempts to model the physician
AI in IPF Research
ICD administrative codes
IPF
ILD
target codes appear
Past medical history
No target codes appear
case
control
2yrs
2yrs
prediction
target codes appear
Past medical history
No target codes appear
case
control
2yrs
2yrs
IPF drugs prescribed
Signature of IPF diagnostic sequence
pirfenidone or nintedanib
ICD Codes can be noisy
"cases" are not always true IPF
Truven MarketScan (IBM) Commerical Claims & Encounters Database 2003-2018
>100M patients visible
>7B individual claims
>87K unique diagnostic codes
>7% Medicare data present
2,053,277 patients included in study
Univesity of Chicago Medicam Center 2012-2021
68,658 patients
Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic
861,280 patients
2,983,215 patients
Data: Onishchenko etal. Nat. Medicine 2022
performance tables
Marketscan Out-of-sample Results
specificty~99%
NPV>99.9%
IPF
ILD
performance tables
UCM Out-of-sample Results
specificty~99%
NPV>99.9%
IPF
ILD
Comorbidity Spectra
patient A
patient B
patient C
lesson 1
Beyond "risk factors" to personalized risk patterns
False Positives:
Ethics:
For every 20-30 flags,
1 is positive
minimal
acceptable?
Better outcomes
Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.
Clinical Trial Cohort Selection
Current screen failure rate ~50-60%
ZCoR boosted screen failure rate ~20%
Longitudinal history is important
lesson 2
Off-the-shelf AI does not suffice
lesson 3
Leveraging Longitudinal Patterns
Specialized HMM models from code sequences
Model control and case cohorts seprately
given a new test case, compute likelihood of sample arising from case models vs control models
sequence likelihood defect
ZeD Lab: Predictive Screening from Comorbidity Footprints
Nature Medicine
JAHA
CELL Reports
Science Adv.
1 in 59
Autism Spectrum Disorder
ASD: Ineffective screening causes delays and incurs costs
Autism Co-morbid Risk (ACoR) Score
Data: Onishchenko etal. Science Advances 2021
Autism Co-morbid Risk (ACoR) Score
MCHAT/F
Head to head comparison with current practice
Data: Onishchenko etal. Science Advances 2021
Joint Operation with MCHAT
CHOP Study allows us to see effectiveness of MCHAT in different sub-populations
Modulate sensitivity/specificity trade-offs
Data: Onishchenko etal. Science Advances 2021
The ZCoR Approch: Rapidly Re-targettable
ZED performance | Competition | |
---|---|---|
Autism | >80% AUC at 2 yrs | "obvious" |
Alzheimer's Disease | ~90% AUC | 60-70% AUC |
Idiopathic Pulmonary Fibrosis | ~90% AUC | NA |
MACE | ~80% AUC | ~70% AUC |
Bipolar Disorder | ~85% AUC | NA |
CKD | ~85% AUC | NA |
Cancers (Prostate, Bladder, Uterus, Skin) | ~75-80% AUC | Low |
Deploy all/many/most of these!
>5 Million in US. >13 Million in next 10 years
Alzheimer's Disease and Related Dimentia
MOCA, Blood Tests
Current Practice:
state of art with EHR:
~67% AUC*
ZCoR: ~87%
Alzheimer's Disease and Related Dimentia
state of art with EHR:
~67% AUC*
ZCoR: ~87%
Preempting ADRD accurately upto a decade in future
Application to Suicide Attempts and Ideation (SISA) , PTSD*:
perhaps surprising connection between mood disorders and physiological comorbidities
Gibbons RD, Kupfer D, Frank E, Moore T, Beiser DG, Boudreaux ED. Development of a Computerized Adaptive Test Suicide Scale-The CAT-SS. J Clin Psychiatry. 2017 Nov/Dec;78(9):1376-1382. doi: 10.4088/JCP.16m10922. PMID: 28493655.
* in press
Application to Malignant Neoplasms*
Melanoma
Melanoma has a high survival rate of over 90% when treated early. But if it progresses to later stages, the survival rate drops significantly. Identifying potentially life-threatening melanomas is crucial.
* in press
Cloud Deployment
[
{
"patient_id": "P000038",
"sex": "F",
"birth_date": "01-01-2006",
"DX_record": [
{"date": "07-31-2006", "code": "Z38.00"},
{"date": "08-07-2006", "code": "P59.9"},
{"date": "08-29-2016", "code": "J01.90"},
{"date": "09-10-2016", "code": "J01.90"},
{"date": "11-14-2016", "code": "J01.91"}
],
"RX_record": [
{"date": "10-29-2011", "code": "rxLDA017"},
{"date": "05-16-2015", "code": "rxIDG004"},
{"date": "08-08-2015", "code": "rxIDG004"},
{"date": "06-04-2016", "code": "rxIDD013"}
],
"PROC_record": [
{"date": "02-05-2007", "code": "90723"},
{"date": "11-05-2007", "code": "J1100"}
]
}
]
{
"predictions": [
{
"error_code": "",
"patient_id": "P000012",
"predicted_risk": 0.005794344620009157,
"probability": 0.8253881317184486
}
],
"target": "TARGET"
}
Data In
Data Out
The Paraknowledge API
curl -X POST -H "Content-Type: application/json" -d '[{"patient_id": "P28109965201", "sex": "M", "age": 89, "fips": "35644", "DX_record": [{"date": "12-16-2011", "code": "R09.02"}, {"date": "12-30-2011", "code": "H04.129"}, {"date": "12-30-2011", "code": "H02.109"}], "RX_record": [], "PROC_record": [{"date": "09-28-2012", "code": "71100"}]}]' "https://us-central1-pkcsaas-01.cloudfunctions.net/zcor_predict?target=IPF&api_key=7eea9f70d79c408f2b69847d911303c"
Current Targets
IPF
ILD
ADRD
CKD
CKD_SEVERE
MELANOMA
CANCER_PANCREAS
CANCER_UTERUS
SISA
Cohort Selection and Risk Analysis Testbed
Cohort Selection and Risk Analysis Testbed
Baseline prevalence of IPF in ILD patients
~25%
ZCoR PPV: 60% @ 50% sensitivity
1310 positive patients from 2183 flags
screen failure:
~70% \(\rightarrow\) 40%
Selection comparison against baseline of 2+ ILD risk factors
baseline prevalence: ~2%
projected screen failure:
~98% baseline \(\rightarrow\) 45%
Patient Journeys for IPF: Tracking increasing Risk Over Time
Upto 4 year "signal" resolution
patient journey
Other Examples
decreases risk
increases risk
Risk decreases sometimes
new codes change trajectory as they are revealed
Take Home Message,
Conclusions
Reading (References)
Onishchenko, Dmytro, Yi Huang, James van Horne, Peter J. Smith, Michael E. Msall, and Ishanu Chattopadhyay. “Reduced False Positives in Autism Screening via Digital Biomarkers Inferred from Deep Comorbidity Patterns.” Science Advances 7, no. 41 (October 8, 2021). https://doi.org/10.1126/sciadv.abf0354.
Onishchenko, Dmytro, Daniel S. Rubin, James R. van Horne, R. Parker Ward, and Ishanu Chattopadhyay. “Cardiac Comorbidity Risk Score: Zero‐Burden Machine Learning to Improve Prediction of Postoperative Major Adverse Cardiac Events in Hip and Knee Arthroplasty.” Journal of the American Heart Association 11, no. 15 (August 2, 2022). https://doi.org/10.1161/jaha.121.023745.
Onishchenko, Dmytro, Robert J. Marlowe, Che G. Ngufor, Louis J. Faust, Andrew H. Limper, Gary M. Hunninghake, Fernando J. Martinez, and Ishanu Chattopadhyay. “Screening for Idiopathic Pulmonary Fibrosis Using Comorbidity Signatures in Electronic Health Records.” Nature Medicine 28, no. 10 (September 29, 2022): 2107–16. https://doi.org/10.1038/s41591-022-02010-y.
Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. “Sequence Likelihood Divergence for Fast Time Series Comparison.” Knowledge and Information Systems 65, no. 7 (March 16, 2023): 3079–98. https://doi.org/10.1007/s10115-023-01855-0.
Brenner, Lisa A., Lisa M. Betthauser, Molly Penzenik, Anne Germain, Jin Jun Li, Ishanu Chattopadhyay, Ellen Frank, David J. Kupfer, and Robert D. Gibbons. "Development and validation of computerized adaptive assessment tools for the measurement of posttraumatic stress disorder among US military veterans." JAMA Network Open 4, no. 7 (2021): e2115707-e2115707.
QUESTIONS
ishanu@uchicago.edu
@ishanu_ch
Delving Deeper into Learning Goals
Early screening of complex diseases by leveraging deep pattern discovery in history of medical encounters
Use AI to transform the landscape of early disease diagnosis, prevention, and treatment strategies for complex medical conditions.
Realize universal primary care low-burden screening for disorders for which potentially no recommended screening tools exist currently
Generalize beyond known “risk factors”, uncover personalized predictors of future risk of serious diseases from subtle comorbidity signatures