Towards a General Theory of Digital Twins In Medicine and Social Modeling

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Kentucky

ishanu@uchicago.edu

first wave

 

rule-based systems

 

second wave

 

Big Data / ML / Deep Learning

recognize patterns, make predictions, might improve over time, but struggle on tasks not trained for

third wave

 

contextual reasoning, generalizable, towards true intelligence

  • Control Systems
  • Screening for complex diseases
  • Digital Twins in biology & medicine
  • Modeling of human behavior
  • Large Science Models
  • Robotics
  • Self-organization of sensor networks
  • Swarm Algorithms
  • data smashing
  • inverse Gillespie

PhD

Postdoc

ZeDLAB

  • Unsupervised anomaly detection

Engineering

Computer Sc

Medicine

Career Trajectory

The Laboratory for Zero Knowledge Discovery

AI/ML learning theory and applications

Implication of AI in Future of Societay

Complex systems

Social interactions & opinion dynamics

Personalized medicine

collaborators

Alex Leow

Psychiatry UIC

Anna Podolanczuk, Pulmonary Care, Weill Cornell

Gary Hunninghake, Pulmonary C, Harvard

Robert Gibbons, Bio-statistics

Daniel Rubins, Anesthesia and Critical Care

Peter Smith, Pediatrics

Michael Msall Pediatrics

Fernando Martinez, Pulmonary Critical Care, Weill Cornell

James Mastrianni, Neurology

James Evans, sociology

Erika Claud, Pediatrics

Aaron Esser-Kahn Molecular Engineering

David Llewellyn

University of Exeter

Kenneth Rockwood

Dalhousie University

Andrew Limper Mayo Clinic

Department of Pediatrics

UChicago

Department of Neurology & The Memory Center

UChicago

Department of Psychiatry

UChicago

Pulmonary Critical Care, Weill Cornell

Department of Anesthesia and Critical Care

UChicago

Center for Health Statistics

UChicago

Pulmonary Critical Care, Harvard Medical School

Department of Psychiatry

UIC

Demon Network, Exeter, Alan Turing Institute, UK

Dalhousie University, Canada

Pritzker School of Molecular ENgineering

Social Science

UChicago

Our Collaborations

D3M (I2O)

PAI (DSO)

PREEMPT (BTO)

YFA (DSO)

NIA

$

Predictive Modeling of Complex Systems

~3.5M USD in 5 years

Publications

&

Impact

Nature Medicine

Nature Human Behavior

Nature Commun-ication

Science Advances

(3)

PNAS

JAMA

JAHA

JACC

Modeling & predicting complex social interactions

Point-of-care screening for complex diseases

Ai

Electronic Healthcare Record 

IPF

ASD

ADRD

ZeD Research Thrusts

General framework for inferring digital twins in biology and medicine

What is a Digital Twin?

Hint. Probably not what classical Engineering and Design Industry meant in the 2000s.

Old Digital Twins:

The first use of the term "digital twin" is generally attributed to Dr. Michael Grieves in a 2002 presentation on product lifecycle management (PLM) at the University of Michigan. 

 

Dr. Grieves discussed the idea of having a virtual representation of a physical product, which would exist throughout the product's lifecycle. This digital model would be used to simulate, predict, and optimize the product's performance, both during design and after it was built. The digital twin would be continuously updated with data from the physical product, enabling real-time analysis and decision-making.

 

Connected body of models, equations, physics at multiple scales, with observational data to inform states, useful over entire life-cycle of the system

 Digital Twin: Generative AI for Complex Systems

"Physics" is unknown/emergent.

 

Data: multi-modal, disparate data-type, disparate scales, noisy, incomplete, often un-labeled

ZCoR Suite:

Disease-specific Digital Twin

  • Predict a single well-defined outcome risk
  • Have a model for individuals that can remain operational throughout lifetime
  • As health trajectory evolves so does the risk at the individual level.
  • Easy to  specialize in different healthcare contexts

~ 4yrs

current  survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Data: Onishchenko etal. Nat. Medicine 2022

patient A

patient B

patient C

Beyond "risk factors" to personalized risk patterns

Upto 4 year "signal" resolution

decreases risk

increases risk

Patient Journey: Tracking Risk over time

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

 

ZCoR:  ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

 

ZCoR:  ~87%

Preempting ADRD accurately upto a decade in future

Autism

MCHAT/F

1 in 59

36

ZeD Lab: Predictive Screening from Comorbidity Footprints

CELL Reports

ZCoR  Competition
Autism >83%  "obvious"
Alzheimer's Disease ~90%  60-70% 
Idiopathic Pulmonary Fibrosis ~90%  NA
MACE ~80%  ~70%  
Bipolar Disorder ~85%  NA
CKD ~85%  NA
Rare Cancers (Bladder, Uterus) ~75-80%  Low
Suicidality (with CAT-SS) 98% PPV Low

Off-the-shelf AI does not suffice

How?

Odds ratios combined via ML 

1

Data

cases

control

\vdots

odds ratios for all ICD codes

\}

ML Model

\}

odds-based risk estimator

0: \textrm{healthy}\\ 1: \textrm{infections}\\ 2: \textrm{other}

Probabilistic Finite State

Map health history to trinary streams

Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.

2

Longitudinal stochastic patterns

PFSAs

from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.

Cloud Deployment

Theoretical formulation

Multi-cohort validation

Launch User-Accessible Platform

3 years

2 years

[
    {
        "patient_id": "P000038",
        "sex": "F",
        "birth_date": "01-01-2006",
        "DX_record": [
            {"date": "07-31-2006", "code": "Z38.00"},
            {"date": "08-07-2006", "code": "P59.9"},
            {"date": "08-29-2016", "code": "J01.90"},
            {"date": "09-10-2016", "code": "J01.90"},
            {"date": "11-14-2016", "code": "J01.91"}
        ],
        "RX_record": [
            {"date": "10-29-2011", "code": "rxLDA017"},
            {"date": "05-16-2015", "code": "rxIDG004"},
            {"date": "08-08-2015", "code": "rxIDG004"},
            {"date": "06-04-2016", "code": "rxIDD013"}
        ],
        "PROC_record": [
            {"date": "02-05-2007", "code": "90723"},
            {"date": "11-05-2007", "code": "J1100"}
        ]
    }
]
{
  "predictions": [
    {
      "error_code": "",
      "patient_id": "P000012",
      "predicted_risk": 0.005794344620009157,
      "probability": 0.8253881317184486
    }
  ],
  "target": "TARGET"
}

Data In

Data Out

Cohort Selection and Risk Analysis Testbed

Misleading Diagnosis of Idiopathic Pulmonary Fibrosis: A Clinical Concern
Javier Ramos-Rossy, MD, Onix Cantres-Fonseca, MD, Ginger Arzon-Nieves, Yomayra Otero-Dominguez, MD, Stella Baez-Corujo, MD, and William Rodríguez-Cintrón, MD

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6248220/

General Digital Twins

General framework for inferring digital twins in biology and medicine

Stamping Out the Next Pandemic **Before** The First Human Infection

BioNorad

\Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

Q-Net

recursive forest

q-distance

a biologically informed, adaptive distance between strains

\theta(x,y) \triangleq \\ \mathbf{E}_i \left ( \mathbb{J}^{\frac{1}{2}} \left (\Phi_i(x_{-i}) , \Phi_i(y_{-i})\right ) \right )

This distance is "special"

Smaller distances imply a quantitatively high probability of spontaneous jump

$$J \textrm{ is the Jensen-Shannon divergence }$$

Metric Structure

Tangent Bundle

geometry

dynamics

\theta(x,y) \sim Pr(x \rightarrow y)
\theta

Influenza Risk Assessment Tool (IRAT) scoring for animal strains

slow (months), quasi-subjective, expensive

*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm

24 scores in 14 years

~10,000 strains collected annually

CDC

Emergenet time: 1 second

THE PROBLEM

Assuming  a 1000 species ecosystem, and 1 successful experiment every day to discern a single two-way relationship, we would need 1,368 years to go through all possibilities.

Digital Twin for the Maturing Human Microbiome 

  • Forecast microbiome maturation trajectories

 

  • Predict neurodevelopmental deficits

Boston U

U Chicago 

Two centers

Ability to "fill in" missing data is equivalent to making trajectory forecasts

predicting neurodevelopmental deficits

forecasting ecosystem trajectories

Which entities are most predictive

of neurodevelopmental deficit

entity X timestamp

SHAP value

No transplantation is guaranteed to work reliably

Just add those microbes back to reduce risk?

 

No!

Bacterial transplantation must be personalized

Future task:

Explicit supplantation profiles that are tuned to individual ecosystems

Problem: Can AI predict how we think and interact?

Can we predict how opinions evolve?

Digital Twins for complex systems

YFA 2020

Can an AI tell if you are lying?

Can an AI tell how you are going to vote?

Yang, David, James EVans, and Ishanu Chattopadhyay. "‘Its the Economy Stupid’: Predictive Theory of Belief Shift Connecting Economic Stress to Societal Polarization." (2023).

Emergent Recursive Forest in GSS

Modeling Responses to PTSD Evaluation

The Cognet Framework

Digital Twin of Opinion dynamics

predict worldviews from incomplete data

Identify malingering in psychiatric diagnoses

GSS variable actual (masked) Reconstructed
spkcom allowed allowed
colcom not fired not fired
spkmil allowed allowed
colmil allowed not allowed
libmil not remove not remove
libhomo not remove not remove
reliten strong no religion
pray once a day once a day
bible inspired word word of god
abhlth yes yes
abpoor no no
pillok agree agree
intmil very interested very interested
abpoorw always wrong not wrong at all
godchnge believe now, always have believe now, always have
prayfreq several times a week several times a week
religcon strong disagree disagree
religint disagree disagree
comfort strongly agree neither agree nor disagree

Reconstruction

Example 1

GSS variable actual (masked) Reconstructed
spkcom allowed allowed
colcom not fired not fired
libmil not remove not remove
libhomo not remove not remove
gunlaw favor favor
reliten no religion no religion
prayer approve approve
bible book of fables inspired word
abnomore yes yes
abhlth yes yes
abpoor yes yes
abany yes yes
owngun no no
intmil moderately interested moderately interested
abpoorw not wrong at all not wrong at all
godchnge believe now, didn't used to believe now, always have
prayfreq several times a week several times a week
religcon strongly agree agree
religint strongly agree not agree/dsagre

Reconstruction

Example 2

GSS Variable actual (masked) reconstructed
spkcom allowed allowed
colcom not fired not fired
libcom not remove not remove
libmil not remove not remove
libhomo not remove not remove
libmslm not remove not remove
gunlaw favor favor
reliten not very strong strong
pray once a week several times a day
bible inspired word word of god
abdefect yes yes
abhlth yes yes
abrape yes yes
pillok strongly agree agree
shotgun no no
abpoorw not wrong at all not wrong at all
godchnge don't believe now, used to believe now, always have
religcon disagree agree
comfort strongly agree agree

Reconstruction

Example 3

Digital Twins for complex systems

Darkome

teomims

opinion dynamics

algorithmic lie detector

Mental health diagnosis

viral emergence

microbiome

Phase 1

Phase 2

PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Algorithm for early diagnosis

Find Data for early prediction

Phase 1

Phase 2

Second Prize 40,000 USD

Lets give them:

  • 1M patients clinical data diagnosed with ADRD/AD 60-80 years
  • 1M African-American patients from Chicagoland
  • Open source - GNU public license

licensed patient data

digital twin

(generative AI)

teomims

(open cohort)

Phase 1

Phase 2

Uncorrelated, yet indistinguishable !!

VeRITaAS

Can A Generative AI Tell if you Are Lying?

Vetting Response Integrity from
cross-Talk in Adversarial
Surveys

Q-Net

Hidden structure of cross-talk between responses to interview items

PTSD diagnostic interview

Number of possible responses

Minimum Performance (n=624)

Average Time: 3.5 min

No. of questions: 20

AUC > 0.95

PPV > 0.86

NPV > 0.92

At least 83.3% sensitivity at 94% specificity

Minimum AUC = \(0.95 \pm 0.005\)

Cannot be coached, or memorized

Datasets for training & validation

1. VA (n=294)

2. Prolific (n=300)

3. Psychiatrists (n=30)

10^{25}

Beat the test!

200 participants in

US

100 participants in

UK

30 forensic psychiatrists

10

6

1

Can-You-Fake-PTSD Challenge Results

successful attempts

Large Science Models (LSM) and Conservation of Complexity

Large Science Models (LSMs)

  • LLMs have revolutionized how we think about using and manipulating the human language. Likewise LSMs could transform the scientific field, with AI begining to understand hard science concepts, theories and data, and reason with complex mathematical models, much in analogy to how AI "understands" the human language in "large language models".

Expanding the Scientific Method:  

  • AI collaboration can finally allow us to find "complex" explanations, exploring that part of nature which cannot be written down on a postage stamp

Nicholas Sizemore et al. ,A digital twin of the infant microbiome to predict neurodevelopmental deficits.Sci. Adv.10,eadj0400(2024).DOI:10.1126/sciadv.adj0400

G_{\mu\nu}+\Lambda g_{\mu\nu} = \frac{8\pi G}{c^4}T_{\mu\nu}

E=MC\(^2\)

ih\frac{\partial \psi}{\partial t} = \hat{H} \psi

Complex systems have irreducible complexity.

Generative models of complex systems must have complex structure, which can be only recovered vi AI-leveraged methods

K(x) = K(s) + K( x \vert S_\star)
  • Two Part code
K(x) = \textrm{ length of smallest program describing } \ x
  • Kolmogorov Complexity

Model complexity

data to model uncertainty

K( x \vert S_\star^0) = K( S^0 \vert x_\star) = O(1)

Kolmogorov Twin

A Kolmogorov twin \(S\) for data \(x\) is a model that is 1) typical, 2) optimal and is of maximal complexity.

K( x \vert S^0_\star) = \log \vert S^0\vert +O(1)
K(x) = K(S^0) + K( x \vert S^0_\star) +O(1)
K(S') = K(S^0) +O(1) \textrm{ for any optimal model } S'

theorem

K( x) = K( S^0 ) + O(1)

Conservation of Complexity

corollary

Impact on Popular Discourse on AI

Media Coverage

In

National Pop-culture Discourse

Interviews, Op-eds, and Forum Appearences

  • Joe Rogan Podcast
  • Walter Isaacson Interview
  • Speaker on Pritzker Forum on Global Cities
  • >150 News articles written on published papers

Rotaru, Victor, Yi Huang, Timmy Li, James Evans, and Ishanu Chattopadhyay. "Event-level prediction of urban crime reveals a signature of enforcement bias in US cities." Nature human behaviour 6, no. 8 (2022): 1056-1068.

Q&A

Copy of Digital Twins in Medicine

By Ishanu Chattopadhyay

Copy of Digital Twins in Medicine

AI for medicine

  • 63