Towards a General Theory of Digital Twins In Medicine and Social Modeling

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Kentucky

ishanu@uchicago.edu

first wave

rule-based systems

second wave

Big Data / ML / Deep Learning

recognize patterns, make predictions, might improve over time, but struggle on tasks not trained for

third wave

contextual reasoning, generalizable, towards true intelligence

Control Systems

Screening for complex diseases
Digital Twins in biology & medicine
Modeling of human behavior
Large Science Models

Robotics

Self-organization of sensor networks

Swarm Algorithms

data smashing

inverse Gillespie

PhD

Postdoc

ZeDLAB

Unsupervised anomaly detection

Engineering

Computer Sc

Medicine

Career Trajectory

The Laboratory for Zero Knowledge Discovery

AI/ML learning theory and applications

Implication of AI in Future of Societay

Complex systems

Social interactions & opinion dynamics

Personalized medicine

collaborators

Alex Leow

Psychiatry UIC

Anna Podolanczuk, Pulmonary Care, Weill Cornell

Gary Hunninghake, Pulmonary C, Harvard

Robert Gibbons, Bio-statistics

Daniel Rubins, Anesthesia and Critical Care

Peter Smith, Pediatrics

Michael Msall Pediatrics

Fernando Martinez, Pulmonary Critical Care, Weill Cornell

James Mastrianni, Neurology

James Evans, sociology

Erika Claud, Pediatrics

Aaron Esser-Kahn Molecular Engineering

David Llewellyn

University of Exeter

Kenneth Rockwood

Dalhousie University

Andrew Limper Mayo Clinic

Department of Pediatrics

UChicago

Department of Neurology & The Memory Center

UChicago

Department of Psychiatry

UChicago

Pulmonary Critical Care, Weill Cornell

Department of Anesthesia and Critical Care

UChicago

Center for Health Statistics

UChicago

Pulmonary Critical Care, Harvard Medical School

Department of Psychiatry

UIC

Demon Network, Exeter, Alan Turing Institute, UK

Dalhousie University, Canada

Pritzker School of Molecular ENgineering

Social Science

UChicago

Our Collaborations

D3M (I2O)

PAI (DSO)

PREEMPT (BTO)

YFA (DSO)

NIA

Predictive Modeling of Complex Systems

~3.5M USD in 5 years

Publications

Impact

Nature Medicine

Nature Human Behavior

Nature Commun-ication

Science Advances

(3)

PNAS

JAMA

JAHA

JACC

Modeling & predicting complex social interactions

Point-of-care screening for complex diseases

Electronic Healthcare Record

IPF

ASD

ADRD

ZeD Research Thrusts

General framework for inferring digital twins in biology and medicine

What is a Digital Twin?

Hint. Probably not what classical Engineering and Design Industry meant in the 2000s.

Old Digital Twins:

The first use of the term "digital twin" is generally attributed to Dr. Michael Grieves in a 2002 presentation on product lifecycle management (PLM) at the University of Michigan.

Dr. Grieves discussed the idea of having a virtual representation of a physical product, which would exist throughout the product's lifecycle. This digital model would be used to simulate, predict, and optimize the product's performance, both during design and after it was built. The digital twin would be continuously updated with data from the physical product, enabling real-time analysis and decision-making.

Connected body of models, equations, physics at multiple scales, with observational data to inform states, useful over entire life-cycle of the system

Digital Twin: Generative AI for Complex Systems

"Physics" is unknown/emergent.

Data: multi-modal, disparate data-type, disparate scales, noisy, incomplete, often un-labeled

ZCoR Suite:

Disease-specific Digital Twin

Predict a single well-defined outcome risk
Have a model for individuals that can remain operational throughout lifetime
As health trajectory evolves so does the risk at the individual level.
Easy to specialize in different healthcare contexts

~ 4yrs

current survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Data: Onishchenko etal. Nat. Medicine 2022

patient A

patient B

patient C

Beyond "risk factors" to personalized risk patterns

Upto 4 year "signal" resolution

decreases risk

increases risk

Patient Journey: Tracking Risk over time

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Preempting ADRD accurately upto a decade in future

Autism

MCHAT/F

1 in 59

ZeD Lab: Predictive Screening from Comorbidity Footprints

CELL Reports

	ZCoR	Competition
Autism	>83%	"obvious"
Alzheimer's Disease	~90%	60-70%
Idiopathic Pulmonary Fibrosis	~90%	NA
MACE	~80%	~70%
Bipolar Disorder	~85%	NA
CKD	~85%	NA
Rare Cancers (Bladder, Uterus)	~75-80%	Low
Suicidality (with CAT-SS)	98% PPV	Low

Off-the-shelf AI does not suffice

How?

Odds ratios combined via ML

Data

cases

control

\vdots

odds ratios for all ICD codes

ML Model

odds-based risk estimator

0: \textrm{healthy}\\ 1: \textrm{infections}\\ 2: \textrm{other}

Probabilistic Finite State

Map health history to trinary streams

Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.

Longitudinal stochastic patterns

PFSAs

from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.

Cloud Deployment

Theoretical formulation

Multi-cohort validation

Launch User-Accessible Platform

3 years

2 years

[
    {
        "patient_id": "P000038",
        "sex": "F",
        "birth_date": "01-01-2006",
        "DX_record": [
            {"date": "07-31-2006", "code": "Z38.00"},
            {"date": "08-07-2006", "code": "P59.9"},
            {"date": "08-29-2016", "code": "J01.90"},
            {"date": "09-10-2016", "code": "J01.90"},
            {"date": "11-14-2016", "code": "J01.91"}
        ],
        "RX_record": [
            {"date": "10-29-2011", "code": "rxLDA017"},
            {"date": "05-16-2015", "code": "rxIDG004"},
            {"date": "08-08-2015", "code": "rxIDG004"},
            {"date": "06-04-2016", "code": "rxIDD013"}
        ],
        "PROC_record": [
            {"date": "02-05-2007", "code": "90723"},
            {"date": "11-05-2007", "code": "J1100"}
        ]
    }
]

{
  "predictions": [
    {
      "error_code": "",
      "patient_id": "P000012",
      "predicted_risk": 0.005794344620009157,
      "probability": 0.8253881317184486
    }
  ],
  "target": "TARGET"
}

Data In

Data Out

Cohort Selection and Risk Analysis Testbed

https://paraknowledge.ai/zcor-testbed/

https://paraknowledge.ai/zcor-demo/

Misleading Diagnosis of Idiopathic Pulmonary Fibrosis: A Clinical Concern
Javier Ramos-Rossy, MD, Onix Cantres-Fonseca, MD, Ginger Arzon-Nieves, Yomayra Otero-Dominguez, MD, Stella Baez-Corujo, MD, and William Rodríguez-Cintrón, MD

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6248220/

General Digital Twins

General framework for inferring digital twins in biology and medicine

Stamping Out the Next Pandemic **Before** The First Human Infection

BioNorad

\Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

Q-Net

recursive forest

q-distance

a biologically informed, adaptive distance between strains

\theta(x,y) \triangleq \\ \mathbf{E}_i \left ( \mathbb{J}^{\frac{1}{2}} \left (\Phi_i(x_{-i}) , \Phi_i(y_{-i})\right ) \right )

This distance is "special"

Smaller distances imply a quantitatively high probability of spontaneous jump

$$J \textrm{ is the Jensen-Shannon divergence }$$

Metric Structure

Tangent Bundle

geometry

dynamics

\theta(x,y) \sim Pr(x \rightarrow y)

\theta

Influenza Risk Assessment Tool (IRAT) scoring for animal strains

slow (months), quasi-subjective, expensive

*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm

24 scores in 14 years

~10,000 strains collected annually

CDC

Emergenet time: 1 second

THE PROBLEM

Assuming a 1000 species ecosystem, and 1 successful experiment every day to discern a single two-way relationship, we would need 1,368 years to go through all possibilities.

Digital Twin for the Maturing Human Microbiome

Forecast microbiome maturation trajectories

Predict neurodevelopmental deficits

Boston U

U Chicago

Two centers

Ability to "fill in" missing data is equivalent to making trajectory forecasts

predicting neurodevelopmental deficits

forecasting ecosystem trajectories

Which entities are most predictive

of neurodevelopmental deficit

entity X timestamp

SHAP value

No transplantation is guaranteed to work reliably

Just add those microbes back to reduce risk?

No!

Bacterial transplantation must be personalized

Future task:

Explicit supplantation profiles that are tuned to individual ecosystems

Problem: Can AI predict how we think and interact?

Can we predict how opinions evolve?

Digital Twins for complex systems

YFA 2020

Can an AI tell if you are lying?

Can an AI tell how you are going to vote?

Yang, David, James EVans, and Ishanu Chattopadhyay. "‘Its the Economy Stupid’: Predictive Theory of Belief Shift Connecting Economic Stress to Societal Polarization." (2023).

Emergent Recursive Forest in GSS

Modeling Responses to PTSD Evaluation

The Cognet Framework

Digital Twin of Opinion dynamics

predict worldviews from incomplete data

Identify malingering in psychiatric diagnoses

GSS variable	actual (masked)	Reconstructed

spkcom	allowed	allowed
colcom	not fired	not fired
spkmil	allowed	allowed
colmil	allowed	not allowed
libmil	not remove	not remove
libhomo	not remove	not remove
reliten	strong	no religion
pray	once a day	once a day
bible	inspired word	word of god
abhlth	yes	yes
abpoor	no	no
pillok	agree	agree
intmil	very interested	very interested
abpoorw	always wrong	not wrong at all
godchnge	believe now, always have	believe now, always have
prayfreq	several times a week	several times a week
religcon	strong disagree	disagree
religint	disagree	disagree
comfort	strongly agree	neither agree nor disagree

Reconstruction

Example 1

GSS variable	actual (masked)	Reconstructed

spkcom	allowed	allowed
colcom	not fired	not fired
libmil	not remove	not remove
libhomo	not remove	not remove
gunlaw	favor	favor
reliten	no religion	no religion
prayer	approve	approve
bible	book of fables	inspired word
abnomore	yes	yes
abhlth	yes	yes
abpoor	yes	yes
abany	yes	yes
owngun	no	no
intmil	moderately interested	moderately interested
abpoorw	not wrong at all	not wrong at all
godchnge	believe now, didn't used to	believe now, always have
prayfreq	several times a week	several times a week
religcon	strongly agree	agree
religint	strongly agree	not agree/dsagre

Reconstruction

Example 2

GSS Variable	actual (masked)	reconstructed

spkcom	allowed	allowed
colcom	not fired	not fired
libcom	not remove	not remove
libmil	not remove	not remove
libhomo	not remove	not remove
libmslm	not remove	not remove
gunlaw	favor	favor
reliten	not very strong	strong
pray	once a week	several times a day
bible	inspired word	word of god
abdefect	yes	yes
abhlth	yes	yes
abrape	yes	yes
pillok	strongly agree	agree
shotgun	no	no
abpoorw	not wrong at all	not wrong at all
godchnge	don't believe now, used to	believe now, always have
religcon	disagree	agree
comfort	strongly agree	agree

Reconstruction

Example 3

Digital Twins for complex systems

Darkome

teomims

opinion dynamics

algorithmic lie detector

Mental health diagnosis

viral emergence

microbiome

Phase 1

Phase 2

PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Algorithm for early diagnosis

Find Data for early prediction

Phase 1

Phase 2

Second Prize 40,000 USD

Lets give them:

1M patients clinical data diagnosed with ADRD/AD 60-80 years
1M African-American patients from Chicagoland
Open source - GNU public license

licensed patient data

digital twin

(generative AI)

teomims

(open cohort)

Phase 1

Phase 2

Uncorrelated, yet indistinguishable !!

VeRITaAS

Can A Generative AI Tell if you Are Lying?

Vetting Response Integrity from
cross-Talk in Adversarial
Surveys

Q-Net

Hidden structure of cross-talk between responses to interview items

PTSD diagnostic interview

Number of possible responses

Minimum Performance (n=624)

Average Time: 3.5 min

No. of questions: 20

AUC > 0.95

PPV > 0.86

NPV > 0.92

At least 83.3% sensitivity at 94% specificity

Minimum AUC = $0.95 \pm 0.005$

Cannot be coached, or memorized

Datasets for training & validation

1. VA (n=294)

2. Prolific (n=300)

3. Psychiatrists (n=30)

10^{25}

Beat the test!

paraknowledge.ai/veritas

200 participants in

100 participants in

30 forensic psychiatrists

Can-You-Fake-PTSD Challenge Results

successful attempts

Large Science Models (LSM) and Conservation of Complexity

Large Science Models (LSMs)

LLMs have revolutionized how we think about using and manipulating the human language. Likewise LSMs could transform the scientific field, with AI begining to understand hard science concepts, theories and data, and reason with complex mathematical models, much in analogy to how AI "understands" the human language in "large language models".

Expanding the Scientific Method:

AI collaboration can finally allow us to find "complex" explanations, exploring that part of nature which cannot be written down on a postage stamp

Nicholas Sizemore et al. ,A digital twin of the infant microbiome to predict neurodevelopmental deficits.Sci. Adv.10,eadj0400(2024).DOI:10.1126/sciadv.adj0400

G_{\mu\nu}+\Lambda g_{\mu\nu} = \frac{8\pi G}{c^4}T_{\mu\nu}

E=MC$^2$

ih\frac{\partial \psi}{\partial t} = \hat{H} \psi

Complex systems have irreducible complexity.

Generative models of complex systems must have complex structure, which can be only recovered vi AI-leveraged methods

K(x) = K(s) + K( x \vert S_\star)

Two Part code

K(x) = \textrm{ length of smallest program describing } \ x

Kolmogorov Complexity

Model complexity

data to model uncertainty

K( x \vert S_\star^0) = K( S^0 \vert x_\star) = O(1)

Kolmogorov Twin

A Kolmogorov twin $S$ for data $x$ is a model that is 1) typical, 2) optimal and is of maximal complexity.

K( x \vert S^0_\star) = \log \vert S^0\vert +O(1)

K(x) = K(S^0) + K( x \vert S^0_\star) +O(1)

K(S') = K(S^0) +O(1) \textrm{ for any optimal model } S'

theorem

K( x) = K( S^0 ) + O(1)

Conservation of Complexity

corollary

Impact on Popular Discourse on AI

Media Coverage

National Pop-culture Discourse

Interviews, Op-eds, and Forum Appearences

Joe Rogan Podcast
Walter Isaacson Interview
Speaker on Pritzker Forum on Global Cities
>150 News articles written on published papers

Rotaru, Victor, Yi Huang, Timmy Li, James Evans, and Ishanu Chattopadhyay. "Event-level prediction of urban crime reveals a signature of enforcement bias in US cities." Nature human behaviour 6, no. 8 (2022): 1056-1068.

Q&A

Copy of Digital Twins in Medicine

By Ishanu Chattopadhyay

Copy of Digital Twins in Medicine

AI for medicine

Ishanu Chattopadhyay PRO

ML Data Science Biomedicine Social Science Faculty

What is a Digital Twin?

How?

This distance is "special"

Kolmogorov Twin

Conservation of Complexity

Q&A

Copy of Digital Twins in Medicine

More from Ishanu Chattopadhyay