Towards a General Theory of Digital Twins In Medicine and Social Modeling

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Kentucky

ich248@uky.edu

The Laboratory for Zero Knowledge Discovery

AI/ML learning theory and applications

Implication of AI in Future of Societay

Complex systems

Social interactions & opinion dynamics

Personalized medicine

current population

Step this population into future 

Simulate each patients health trajectory into the future

aggregate over population

Future population demands

Phase 1

Phase 2

Uncorrelated, yet indistinguishable !!

Phase 1

Phase 2

PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Algorithm for early diagnosis

Find Data for early prediction

Phase 1

Phase 2

Second Prize 40,000 USD

Lets give them:

  • 1M patients clinical data diagnosed with ADRD/AD 60-80 years
  • 1M African-American patients from Chicagoland
  • Open source - GNU public license

licensed patient data

digital twin

(generative AI)

teomims

(open cohort)

Modeling & predicting complex social interactions

Point-of-care screening for complex diseases

Ai

Electronic Healthcare Record 

IPF

ASD

ADRD

ZeD Research Thrusts

General framework for inferring digital twins in biology and medicine

What is a Digital Twin?

Hint. Probably not what classical Engineering and Design Industry meant in the 2000s.

Old Digital Twins:

The first use of the term "digital twin" is generally attributed to Dr. Michael Grieves in a 2002 presentation on product lifecycle management (PLM) at the University of Michigan. 

 

Dr. Grieves discussed the idea of having a virtual representation of a physical product, which would exist throughout the product's lifecycle. This digital model would be used to simulate, predict, and optimize the product's performance, both during design and after it was built. The digital twin would be continuously updated with data from the physical product, enabling real-time analysis and decision-making.

 

Connected body of models, equations, physics at multiple scales, with observational data to inform states, useful over entire life-cycle of the system

 Digital Twin: Generative AI for Complex Systems

"Physics" is unknown/emergent.

 

Data: multi-modal, disparate data-type, disparate scales, noisy, incomplete, often un-labeled

ZCoR Suite:

Disease-specific Digital Twin

  • Predict a single well-defined outcome risk
  • Have a model for individuals that can remain operational throughout lifetime
  • As health trajectory evolves so does the risk at the individual level.
  • Easy to  specialize in different healthcare contexts

"test-free" screening?

  • Autism
  • Idiopathic Pulmonary Fibrosis
  • Alzheimer's Disease and related dementia
  • Suicidality, PTSD
  • Perioperative Cardiac Event
  • Aggressive Melanoma
  • Uterine Cancer
  • Pancreatic Cancer
  • non-existent biomarkers 

 

  • expensive, time-consuming diagnostic tests
  • Lack of Universal Screening at the point of care
  • Early diagnosis is difficult, late or missed diagnosis costs lives

We lack Universal Screening

for most diseases

Prognosis at Point-of-Diagnosis 

  • Optimizing Management

Patient Journey 

  • Continuous Risk Monitoring

Early Diagnosis

  • Universal Screening
  • Cohort Selection

Reduce screen failure rates

Holistic health surveillance

Predict antifibrotics continuation

improve outcomes

1

2

3

Interstitial Lung Disease / Pulmonary Fibrosis

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Flag patients before they (or doctors) suspect 

Primary Care

Pulmonologist

Zero-burden Co-morbid Risk Score (ZCoR)

Referral

shortness of breath

dry cough

doctor can hear velcro crackles

Non-specific Symptoms

>50 years old

more men than women

IPF

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

Cannot always be seen on CXR

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

PCP workflow demands

Known Co-morbidities of PF

Are there more? Subtle footprints in the medical history that are more heterogeneous? 

~ 4yrs

current  survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML  attempts to model the physician

AI in IPF Research

  • Co-morbidity patterns
  • No data demands
  • Use whatever data is already on patient file

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

2yrs

prediction

target codes appear

Past medical history

No target codes appear

case

control

2yrs

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

  • age > 50 years
  • at least two IPF target codes identified at least 1 month apart 
  • chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
  • no claims for alternative ILD codes occurring on or after the first IPF claim

ICD Codes can be noisy

"cases" are not always true IPF

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible 

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

University of Chicago Medical Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients 

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

patient A

patient B

patient C

Beyond "risk factors" to personalized risk patterns

Clinical Trial Cohort Selection

Current screen failure rate ~50-60%

ZCoR boosted screen failure rate ~20%

cohort size: 2000

initial cohort size: 5000

initial cohort size with ZCoR: 2500

Cost per patient for confirmatory tests: ~7k USD

Savings: ~20M USD

Upto 4 year "signal" resolution

decreases risk

increases risk

Patient Journey: Tracking Risk over time

Autism

1 in 59

36

MCHAT/F

Alzheimer's Disease and Related Dementia*

* in press

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

 

ZCoR:  ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

 

ZCoR:  ~87%

Preempting ADRD accurately upto a decade in future

Applicable To Screening for Mild Cognitive Impairment

Clinical Trial Participant Selection

Current screen-failure rate: 80-90%

 

Estimated rate with ZCoR:

40%

ZeD Lab: Predictive Screening from Comorbidity Footprints

CELL Reports

ZCoR  Competition
Autism >83%  "obvious"
Alzheimer's Disease ~90%  60-70% 
Idiopathic Pulmonary Fibrosis ~90%  NA
MACE ~80%  ~70%  
Bipolar Disorder ~85%  NA
CKD ~85%  NA
Rare Cancers (Bladder, Uterus) ~75-80%  Low
Suicidality (with CAT-SS) 98% PPV Low

Off-the-shelf AI does not suffice

How?

Odds ratios combined via ML 

1

Data

cases

control

\vdots

odds ratios for all ICD codes

\}

ML Model

\}

odds-based risk estimator

0: \textrm{healthy}\\ 1: \textrm{infections}\\ 2: \textrm{other}

Probabilistic Finite State

Map health history to trinary streams

Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.

2

Longitudinal stochastic patterns

How?

Odds ratios combined via ML 

1

Data

cases

control

\vdots

odds ratios for all ICD codes

\}

ML Model

\}

odds-based risk estimator

\rho(X) = \zeta\left (\bigcup_i \bigg \{ \mathcal{O}(x_i) \bigg \}\right )

minimize generalization error by constraining model capacity

Conservation of complexity!

K(x) = K(S) + K(x \vert S_\star) + O(1)

for digital twins

K(x \vert S_\star) = O(1)
0: \textrm{healthy}\\ 1: \textrm{infections}\\ 2: \textrm{other}

Probabilistic Finite State

Map health history to trinary streams

Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.

2

Longitudinal stochastic patterns

Timestamped Diagnostic Data

\{

choose disease category

(e.g. infections)

(specialized HMMs)

PFSAs

from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.

case

Cloud Deployment

Theoretical formulation

Multi-cohort validation

Launch User-Accessible Platform

3 years

2 years

[
    {
        "patient_id": "P000038",
        "sex": "F",
        "birth_date": "01-01-2006",
        "DX_record": [
            {"date": "07-31-2006", "code": "Z38.00"},
            {"date": "08-07-2006", "code": "P59.9"},
            {"date": "08-29-2016", "code": "J01.90"},
            {"date": "09-10-2016", "code": "J01.90"},
            {"date": "11-14-2016", "code": "J01.91"}
        ],
        "RX_record": [
            {"date": "10-29-2011", "code": "rxLDA017"},
            {"date": "05-16-2015", "code": "rxIDG004"},
            {"date": "08-08-2015", "code": "rxIDG004"},
            {"date": "06-04-2016", "code": "rxIDD013"}
        ],
        "PROC_record": [
            {"date": "02-05-2007", "code": "90723"},
            {"date": "11-05-2007", "code": "J1100"}
        ]
    }
]
{
  "predictions": [
    {
      "error_code": "",
      "patient_id": "P000012",
      "predicted_risk": 0.005794344620009157,
      "probability": 0.8253881317184486
    }
  ],
  "target": "TARGET"
}

Data In

Data Out

Cohort Selection and Risk Analysis Testbed

Misleading Diagnosis of Idiopathic Pulmonary Fibrosis: A Clinical Concern
Javier Ramos-Rossy, MD, Onix Cantres-Fonseca, MD, Ginger Arzon-Nieves, Yomayra Otero-Dominguez, MD, Stella Baez-Corujo, MD, and William Rodríguez-Cintrón, MD

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6248220/

 

Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited

Q&A

Digital Twins for complex systems

Darkome

teomims

opinion dynamics

algorithmic lie detector

Mental health diagnosis

viral emergence

microbiome