Ishanu Chattopadhyay PRO
ML Data Science Biomedicine Social Science Faculty
Towards a General Theory of Digital Twins In Medicine and Social Modeling
Ishanu Chattopadhyay, PhD
Assistant Professor of Medicine
University of Kentucky
ich248@uky.edu
The Laboratory for Zero Knowledge Discovery
AI/ML learning theory and applications
Implication of AI in Future of Societay
Complex systems
Social interactions & opinion dynamics
Personalized medicine
current population
Step this population into future
Simulate each patients health trajectory into the future
aggregate over population
Future population demands
Phase 1
Phase 2
Uncorrelated, yet indistinguishable !!
Phase 1
Phase 2
PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge
Algorithm for early diagnosis
Find Data for early prediction
Phase 1
Phase 2
Second Prize 40,000 USD
Lets give them:
licensed patient data
digital twin
(generative AI)
teomims
(open cohort)
Modeling & predicting complex social interactions
Point-of-care screening for complex diseases
Ai
Electronic Healthcare Record
IPF
ASD
ADRD
ZeD Research Thrusts
General framework for inferring digital twins in biology and medicine
Hint. Probably not what classical Engineering and Design Industry meant in the 2000s.
Old Digital Twins:
The first use of the term "digital twin" is generally attributed to Dr. Michael Grieves in a 2002 presentation on product lifecycle management (PLM) at the University of Michigan.
Dr. Grieves discussed the idea of having a virtual representation of a physical product, which would exist throughout the product's lifecycle. This digital model would be used to simulate, predict, and optimize the product's performance, both during design and after it was built. The digital twin would be continuously updated with data from the physical product, enabling real-time analysis and decision-making.
Connected body of models, equations, physics at multiple scales, with observational data to inform states, useful over entire life-cycle of the system
Digital Twin: Generative AI for Complex Systems
"Physics" is unknown/emergent.
Data: multi-modal, disparate data-type, disparate scales, noisy, incomplete, often un-labeled
ZCoR Suite:
Disease-specific Digital Twin
"test-free" screening?
We lack Universal Screening
for most diseases
Prognosis at Point-of-Diagnosis
Patient Journey
Early Diagnosis
Reduce screen failure rates
Holistic health surveillance
Predict antifibrotics continuation
improve outcomes
1
2
3
Interstitial Lung Disease / Pulmonary Fibrosis
Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records
Flag patients before they (or doctors) suspect
Primary Care
Pulmonologist
Zero-burden Co-morbid Risk Score (ZCoR)
Referral
shortness of breath
dry cough
doctor can hear velcro crackles
Non-specific Symptoms
>50 years old
more men than women
IPF
Rare disease
~5 in 10,000
Post-Dx
Survival
~4 years
Cannot always be seen on CXR
At least one misdiagnosis
~55%
Two or more misdiagnosis
38%
Initially attributed to age related symptoms:
72%
PCP workflow demands
Known Co-morbidities of PF
Are there more? Subtle footprints in the medical history that are more heterogeneous?
~ 4yrs
current survival ~4yrs
~ 4yrs
current clinical DX
ZCoR screening
Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y
n=~3M
AUC~90%
Likelihood ratio ~30
Conventional AI/ML attempts to model the physician
AI in IPF Research
ICD administrative codes
IPF
ILD
target codes appear
Past medical history
No target codes appear
case
control
2yrs
2yrs
prediction
target codes appear
Past medical history
No target codes appear
case
control
2yrs
2yrs
IPF drugs prescribed
Signature of IPF diagnostic sequence
pirfenidone or nintedanib
ICD Codes can be noisy
"cases" are not always true IPF
Truven MarketScan (IBM) Commerical Claims & Encounters Database 2003-2018
>100M patients visible
>7B individual claims
>87K unique diagnostic codes
>7% Medicare data present
2,053,277 patients included in study
University of Chicago Medical Center 2012-2021
68,658 patients
Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic
861,280 patients
2,983,215 patients
Data: Onishchenko etal. Nat. Medicine 2022
patient A
patient B
patient C
Beyond "risk factors" to personalized risk patterns
Clinical Trial Cohort Selection
Current screen failure rate ~50-60%
ZCoR boosted screen failure rate ~20%
cohort size: 2000
initial cohort size: 5000
initial cohort size with ZCoR: 2500
Cost per patient for confirmatory tests: ~7k USD
Savings: ~20M USD
Upto 4 year "signal" resolution
decreases risk
increases risk
Patient Journey: Tracking Risk over time
Autism
1 in 59
36
MCHAT/F
Alzheimer's Disease and Related Dementia*
* in press
>5 Million in US. >13 Million in next 10 years
Alzheimer's Disease and Related Dimentia
MOCA, Blood Tests
Current Practice:
state of art with EHR:
~67% AUC*
ZCoR: ~87%
Alzheimer's Disease and Related Dimentia
state of art with EHR:
~67% AUC*
ZCoR: ~87%
Preempting ADRD accurately upto a decade in future
Applicable To Screening for Mild Cognitive Impairment
Clinical Trial Participant Selection
Current screen-failure rate: 80-90%
Estimated rate with ZCoR:
40%
ZeD Lab: Predictive Screening from Comorbidity Footprints
CELL Reports
ZCoR | Competition | |
---|---|---|
Autism | >83% | "obvious" |
Alzheimer's Disease | ~90% | 60-70% |
Idiopathic Pulmonary Fibrosis | ~90% | NA |
MACE | ~80% | ~70% |
Bipolar Disorder | ~85% | NA |
CKD | ~85% | NA |
Rare Cancers (Bladder, Uterus) | ~75-80% | Low |
Suicidality (with CAT-SS) | 98% PPV | Low |
Off-the-shelf AI does not suffice
Odds ratios combined via ML
1
Data
cases
control
odds ratios for all ICD codes
ML Model
odds-based risk estimator
Probabilistic Finite State
Map health history to trinary streams
Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.
2
Longitudinal stochastic patterns
Odds ratios combined via ML
1
Data
cases
control
odds ratios for all ICD codes
ML Model
odds-based risk estimator
minimize generalization error by constraining model capacity
Conservation of complexity!
for digital twins
Probabilistic Finite State
Map health history to trinary streams
Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.
2
Longitudinal stochastic patterns
Timestamped Diagnostic Data
choose disease category
(e.g. infections)
(specialized HMMs)
PFSAs
from code sequences
Model control and case cohorts seprately
given a new test case, compute likelihood of sample arising from case models vs control models
sequence likelihood defect
Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.
case
Cloud Deployment
Theoretical formulation
Multi-cohort validation
Launch User-Accessible Platform
3 years
2 years
[
{
"patient_id": "P000038",
"sex": "F",
"birth_date": "01-01-2006",
"DX_record": [
{"date": "07-31-2006", "code": "Z38.00"},
{"date": "08-07-2006", "code": "P59.9"},
{"date": "08-29-2016", "code": "J01.90"},
{"date": "09-10-2016", "code": "J01.90"},
{"date": "11-14-2016", "code": "J01.91"}
],
"RX_record": [
{"date": "10-29-2011", "code": "rxLDA017"},
{"date": "05-16-2015", "code": "rxIDG004"},
{"date": "08-08-2015", "code": "rxIDG004"},
{"date": "06-04-2016", "code": "rxIDD013"}
],
"PROC_record": [
{"date": "02-05-2007", "code": "90723"},
{"date": "11-05-2007", "code": "J1100"}
]
}
]
{
"predictions": [
{
"error_code": "",
"patient_id": "P000012",
"predicted_risk": 0.005794344620009157,
"probability": 0.8253881317184486
}
],
"target": "TARGET"
}
Data In
Data Out
Cohort Selection and Risk Analysis Testbed
Misleading Diagnosis of Idiopathic Pulmonary Fibrosis: A Clinical Concern
Javier Ramos-Rossy, MD, Onix Cantres-Fonseca, MD, Ginger Arzon-Nieves, Yomayra Otero-Dominguez, MD, Stella Baez-Corujo, MD, and William Rodríguez-Cintrón, MD
Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited
Digital Twins for complex systems
Darkome
teomims
opinion dynamics
algorithmic lie detector
Mental health diagnosis
viral emergence
microbiome
By Ishanu Chattopadhyay
AI for medicine
ML Data Science Biomedicine Social Science Faculty