Third-wave AI in Medicine:
From Test-free Screening of Complex Diseases
to
Digital Twins of Microbiomes and Pandemics
Ishanu Chattopadhyay, PhD
Assistant Professor of Medicine
University of Chicago
ishanu@uchicago.edu
first wave
rule-based systems
second wave
Big Data / ML / Deep Learning
recognize patterns, make predictions, might improve over time, but struggle on tasks not trained for
third wave
contextual reasoning, generlizable, towards true intelligence
mathematics
computer science
social science
medicine
AI/ML learning theory and applications
Complex systems
Implication of AI in Future of Societay
University of Chicago Medicine
The Laboratory for Zero Knowledge Discovery
collaborators
Alex Leow
Psychiatry UIC
Anna Podolanczuk, Pulmonary Care, Weill Cornell
Gary Hunninghake, Pulmonary C, Harvard
Robert Gibbons, Bio-statistics
Daniel Rubins, Anesthesia and Critical Care
Peter Smith, Pediatrics
Michael Msall Pediatrics
Fernando Martinez, Pulmonary Critical Care, Weill Cornell
James Mastrianni, Neurology
James Evans, sociology
Erika Claud, Pediatrics
Aaron Esser-Kahn Molecular Engineering
David Llewellyn
University of Exeter
Kenneth Rockwood
Dalhousie University
Andrew Limper Mayo Clinic
zed.uchicago.edu
Department of Pediatrics
UChicago
Department of Neurology & The Memory Center
UChicago
Department of Psychiatry
UChicago
Pulmonary Critical Care, Weill Cornell
Department of Anesthesia and Critical Care
UChicago
Center for Health Statistics
UChicago
Pulmonary Critical Care, Harvard Medical School
Department of Psychiatry
UIC
Demon Network, Exeter, Alan Turing Institute, UK
Dalhousie University, Canada
Pritzker School of Molecular ENgineering
Social Science
UChicago
zed.uchicago.edu
Dr. Shahab Asoodeh
Dr. Yi Huang
Dmytro Onishenko
Victor Rotaru
Jin Li
Ruolin Zhang
David Yang
Dr. Nicholas Sizemore
Drew Vlasnik
Lucas Mantovani
Jaydeep Dhanoa
Jasmine Mithani
Angela Zhang
Warren Mo
Kevin Wu
zed.uchicago.edu
D3M (I2O)
PAI (DSO)
PREEMPT (BTO)
YFA (DSO)
NIA
Nature Medicine
Nature Human Behavior
Nature Commun-ication
Science Advances
(3)
PNAS
JAMA
JAHA
JACC
Publications
&
Impact
Research Direction 1
Point-of-care screening for complex diseases
Can we reliably screen for complex diseases such as pulmonary fibrosis, dementia and rare cancers?
Research Direction II
Digital Twins
General framework for inferring digital twins in biology and medicine
Ai
Electronic Healthcare Record
IPF
ASD
ADRD
Research Direction 1
"test-free" screening?
We lack Universal Screening
for most diseases
Prognosis at Point-of-Diagnosis
Patient Journey
Early Diagnosis
Reduce screen failure rates
Holistic health surveillance
Predict antifibrotics continuation
improve outcomes
1
2
3
Interstitial Lung Disease / Pulmonary Fibrosis
Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records
Flag patients before they (or doctors) suspect
Primary Care
Pulmonologist
Zero-burden Co-morbid Risk Score (ZCoR)
Referral
shortness of breath
dry cough
doctor can hear velcro crackles
Non-specific Symptoms
>50 years old
more men than women
IPF
Rare disease
~5 in 10,000
Post-Dx
Survival
~4 years
Cannot always be seen on CXR
At least one misdiagnosis
~55%
Two or more misdiagnosis
38%
Initially attributed to age related symptoms:
72%
PCP workflow demands
Known Co-morbidities of PF
Are there more? Subtle footprints in the medical history that are more heterogeneous?
~ 4yrs
current survival ~4yrs
~ 4yrs
current clinical DX
ZCoR screening
Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y
n=~3M
AUC~90%
Likelihood ratio ~30
Conventional AI/ML attempts to model the physician
AI in IPF Research
ICD administrative codes
IPF
ILD
target codes appear
Past medical history
No target codes appear
case
control
2yrs
2yrs
prediction
target codes appear
Past medical history
No target codes appear
case
control
2yrs
2yrs
IPF drugs prescribed
Signature of IPF diagnostic sequence
pirfenidone or nintedanib
ICD Codes can be noisy
"cases" are not always true IPF
Truven MarketScan (IBM) Commerical Claims & Encounters Database 2003-2018
>100M patients visible
>7B individual claims
>87K unique diagnostic codes
>7% Medicare data present
2,053,277 patients included in study
University of Chicago Medical Center 2012-2021
68,658 patients
Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic
861,280 patients
2,983,215 patients
Data: Onishchenko etal. Nat. Medicine 2022
patient A
patient B
patient C
Beyond "risk factors" to personalized risk patterns
Clinical Trial Cohort Selection
Current screen failure rate ~50-60%
ZCoR boosted screen failure rate ~20%
cohort size: 2000
initial cohort size: 5000
initial cohort size with ZCoR: 2500
Cost per patient for confirmatory tests: ~7k USD
Savings: ~20M USD
Upto 4 year "signal" resolution
decreases risk
increases risk
Patient Journey: Tracking Risk over time
Autism
1 in 59
36
MCHAT/F
Alzheimer's Disease and Related Dementia*
* in press
>5 Million in US. >13 Million in next 10 years
Alzheimer's Disease and Related Dimentia
MOCA, Blood Tests
Current Practice:
state of art with EHR:
~67% AUC*
ZCoR: ~87%
Alzheimer's Disease and Related Dimentia
state of art with EHR:
~67% AUC*
ZCoR: ~87%
Preempting ADRD accurately upto a decade in future
Applicable To Screening for Mild Cognitive Impairment
Clinical Trial Participant Selection
Current screen-failure rate: 80-90%
Estimated rate with ZCoR:
40%
ZeD Lab: Predictive Screening from Comorbidity Footprints
CELL Reports
ZCoR | Competition | |
---|---|---|
Autism | >83% | "obvious" |
Alzheimer's Disease | ~90% | 60-70% |
Idiopathic Pulmonary Fibrosis | ~90% | NA |
MACE | ~80% | ~70% |
Bipolar Disorder | ~85% | NA |
CKD | ~85% | NA |
Rare Cancers (Bladder, Uterus) | ~75-80% | Low |
Suicidality (with CAT-SS) | 98% PPV | Low |
Off-the-shelf AI does not suffice
Odds ratios combined via ML
1
Data
cases
control
odds ratios for all ICD codes
ML Model
odds-based risk estimator
Probabilistic Finite State
Map health history to trinary streams
Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.
2
Longitudinal stochastic patterns
PFSAs
from code sequences
Model control and case cohorts seprately
given a new test case, compute likelihood of sample arising from case models vs control models
sequence likelihood defect
Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.
Cloud Deployment
Theoretical formulation
Multi-cohort validation
Launch User-Accessible Platform
3 years
2 years
[
{
"patient_id": "P000038",
"sex": "F",
"birth_date": "01-01-2006",
"DX_record": [
{"date": "07-31-2006", "code": "Z38.00"},
{"date": "08-07-2006", "code": "P59.9"},
{"date": "08-29-2016", "code": "J01.90"},
{"date": "09-10-2016", "code": "J01.90"},
{"date": "11-14-2016", "code": "J01.91"}
],
"RX_record": [
{"date": "10-29-2011", "code": "rxLDA017"},
{"date": "05-16-2015", "code": "rxIDG004"},
{"date": "08-08-2015", "code": "rxIDG004"},
{"date": "06-04-2016", "code": "rxIDD013"}
],
"PROC_record": [
{"date": "02-05-2007", "code": "90723"},
{"date": "11-05-2007", "code": "J1100"}
]
}
]
{
"predictions": [
{
"error_code": "",
"patient_id": "P000012",
"predicted_risk": 0.005794344620009157,
"probability": 0.8253881317184486
}
],
"target": "TARGET"
}
Data In
Data Out
Cohort Selection and Risk Analysis Testbed
Misleading Diagnosis of Idiopathic Pulmonary Fibrosis: A Clinical Concern
Javier Ramos-Rossy, MD, Onix Cantres-Fonseca, MD, Ginger Arzon-Nieves, Yomayra Otero-Dominguez, MD, Stella Baez-Corujo, MD, and William Rodríguez-Cintrón, MD
Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited
Research Direction II
Digital Twins
General framework for inferring digital twins in biology and medicine
Chattopadhyay, Ishanu, Kevin Wu, Jin Li, and Aaron Esser-Kahn. "Emergenet: Fast Scalable Pandemic Risk Assessment of Influenza A Strains Circulating In Non-human Hosts." (2023). Under Review in Nature
PREEMPT
Predicting Future Mutations for Viral Genomes in the Wild
predict future emergence risk
Q-Net
recursive forest
q-distance
a biologically informed, adaptive distance between strains
Smaller distances imply a quantitatively high probability of spontaneous jump
$$J \textrm{ is the Jensen-Shannon divergence }$$
Metric Structure
Tangent Bundle
geometry
dynamics
Influenza Risk Assessment Tool (IRAT) scoring for animal strains
slow (months), quasi-subjective, expensive
*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm
24 scores in 14 years
~10,000 strains collected annually
CDC
Emergenet time: 1 second
Stamping Out the Next Pandemic **Before** The First Human Infection
BioNorad
THE PROBLEM
Assuming a 1000 species ecosystem, and 1 successful experiment every day to discern a single two-way relationship, we would need 1,368 years to go through all possibilities.
Digital Twin for the Maturing Human Microbiome
Boston U
U Chicago
Two centers
Ability to "fill in" missing data is equivalent to making trajectory forecasts
predicting neurodevelopmental deficits
forecasting ecosystem trajectories
Which entities are most predictive
of neurodevelopmental deficit
entity X timestamp
SHAP value
No transplantation is guaranteed to work reliably
Just add those microbes back to reduce risk?
No!
Bacterial transplantation must be personalized
Future task:
Explicit supplantation profiles that are tuned to individual ecosystems
Mental health diagnosis
opinion dynamics
microbiome
viral emergence
Digital Twins for complex systems
algorithmic lie detector
teomims
Darkome
What other problems can it solve?
Phase 1
Phase 2
Lets give them:
teomims
(open cohort)
licensed patient data
digital twin
(generative AI)
PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge
Algorithm for early diagnosis
Find Data for early prediction
Phase 1
Phase 2
Phase 1
Phase 2
Uncorrelated, yet indistinguishable !!
VeRITaAS
Can A Generative AI Tell if you Are Lying?
Vetting Response Integrity from
cross-Talk in Adversarial
Surveys
Hidden structure of cross-talk between responses to interview items
PTSD diagnostic interview
Q-Net
Number of possible responses
Minimum Performance (n=624)
Average Time: 3.5 min
No. of questions: 20
AUC > 0.95
PPV > 0.86
NPV > 0.92
At least 83.3% sensitivity at 94% specificity
Minimum AUC = \(0.95 \pm 0.005\)
Cannot be coached, or memorized
Datasets for training & validation
1. VA (n=294)
2. Prolific (n=300)
3. Psychiatrists (n=30)
Beat the test!
200 participants in
US
100 participants in
UK
30 forensic psychiatrists
10
6
1
Can-You-Fake-PTSD Challenge Results
successful attempts
Future
Vision
Transform bio-surveillance
Transform modeling of complex systems
Transform early diagnosis
Democratize AI unleashing its power for social good
ishanu chattopadhyay
ishanu@uchicago.edu
Impact on Popular Discourse on AI
In
National Pop-culture Discourse
Interviews, Op-eds, and Forum Appearences
Media Coverage
Rotaru, Victor, Yi Huang, Timmy Li, James Evans, and Ishanu Chattopadhyay. "Event-level prediction of urban crime reveals a signature of enforcement bias in US cities." Nature human behaviour 6, no. 8 (2022): 1056-1068.