AI in Medicine:

From Test-free Screening of Complex Diseases

to

Understanding Microbiomes, Self-organization and Zoonotic Emergence

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Chicago

ishanu@uchicago.edu

first wave

 

rule-based systems

 

second wave

 

Big Data / ML / Deep Learning

recognize patterns, make predictions, might improve over time, but struggle on tasks not trained for

third wave

 

contextual reasoning, generlizable, towards true intelligence

Rotaru, Victor, Yi Huang, Timmy Li, James Evans, and Ishanu Chattopadhyay. "Event-level prediction of urban crime reveals a signature of enforcement bias in US cities." Nature human behaviour 6, no. 8 (2022): 1056-1068.

mathematics

computer science

social science

medicine

AI/ML learning theory and applications

Complex systems

Implication of AI in Future of Societay

University of Chicago Medicine

The Laboratory for Zero Knowledge Discovery

collaborators

Alex Leow

Psychiatry UIC

Anna Podolanczuk, Pulmonary Care, Weill Cornell

Gary Hunninghake, Pulmonary C, Harvard

Robert Gibbons, Bio-statistics

Daniel Rubins, Anesthesia and Critical Care

Peter Smith, Pediatrics

Michael Msall Pediatrics

Fernando Martinez, Pulmonary Critical Care, Weill Cornell

James Mastrianni, Neurology

James Evans, sociology

Erika Claud, Pediatrics

Aaron Esser-Kahn Molecular Engineering

David Llewellyn

University of Exeter

Kenneth Rockwood

Dalhousie University

Andrew Limper Mayo Clinic

zed.uchicago.edu

  • Dr. Shahab Asoodeh

  • Dr. Yi Huang

  • Dmytro Onishenko

  • Victor Rotaru

  • Jin Li

  • Ruolin Zhang

  • David Yang

 
  • Dr. Nicholas Sizemore

  • Drew Vlasnik

  • Lucas Mantovani

  • Jaydeep Dhanoa

  • Jasmine Mithani

  • Angela Zhang

  • Warren Mo

  • Kevin Wu

 

zed.uchicago.edu

Department of Pediatrics

UChicago

Department of Neurology & The Memory Center

UChicago

Department of Psychiatry

UChicago

Pulmonary Critical Care, Weill Cornell

Department of Anesthesia and Critical Care

UChicago

Center for Health Statistics

UChicago

Pulmonary Critical Care, Harvard Medical School

Department of Psychiatry

UIC

Demon Network, Exeter, Alan Turing Institute, UK

Dalhousie University, Canada

Pritzker School of Molecular ENgineering

Social Science

UChicago

zed.uchicago.edu

D3M (I2O)

PAI (DSO)

PREEMPT (BTO)

YFA (DSO)

NIA

ACT 1

point-of-care screening for complex diseases

Can we use existing EHR to reliably screen for complex diseases such as pulmonary fibrosis, dementia and rare cancers?

Ai

Electronic Healthcare Record 

IPF

ASD

ADRD

Onishchenko, Dmytro, Robert J. Marlowe, Che G. Ngufor, Louis J. Faust, Andrew H. Limper, Gary M. Hunninghake, Fernando J. Martinez, and Ishanu Chattopadhyay. "Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records." Nature Medicine 28, no. 10 (2022): 2107-2116.

Universal screening for complex diseases

ACT II

Can We Model Ecosystems As They Evolve ?

Can we predict future mutations? 

Digital Twins for complex systems

Can we find generative models for microbiome dynamics?

ACT I

Universal Screening?

  • Autism
  • Idiopathic Pulmonary Fibrosis
  • Alzheimer's Disease and related dementia
  • Suicidality, PTSD
  • Perioperative Cardiac Event
  • Aggressive Melanoma
  • Uterine Cancer
  • Pancreatic Cancer
  • ...      
  •  
  •            
  • non-existent biomarkers 

 

  • expensive, time-consuming diagnostic tests
  • Lack of Universal Screening at the point of care
  • Early diagnosis is difficult, late or missed diagnosis costs lives

Is AI/ML  adding anything of  relevance?

"predicting" autism > 3yrs

"diagnosing" fibrosis from lung imaging

"diagnosing" dementia from  brain scan

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Flag patients before they (or doctors) suspect 

Primary Care

Pulmonologist

?

Zero-burden Co-morbid Risk Score (ZCoR)

shortness of breath

dry cough

doctor can hear velcro crackles

Common Symptoms

>50 years old

more men than women

IPF

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

Cannot always be seen on CXR

Non-specific symptoms

PCP workflow demands

~ 4yrs

current  survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML  attempts to model the physician

AI in IPF Research

  • Co-morbidity Patterns
  • No data demands
  • Use whatever data is already on patient file

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

2yrs

prediction

target codes appear

Past medical history

No target codes appear

case

control

2yrs

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

  • age > 50 years
  • at least two IPF target codes identified at least 1 month apart 
  • chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
  • no claims for alternative ILD codes occurring on or after the first IPF claim

ICD Codes can be noisy

"cases" are not always true IPF

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible 

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

University of Chicago Medical Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients 

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

Comorbidity Spectra

patient A

patient B

patient C

lesson 1

Beyond "risk factors" to personalized risk patterns

False Positives: 

  • Heathcare Capacity

Ethics:

  • Risk from Imaging Tests

For every 20-30 flags,

1 is positive

  • General likelihood ratio 60-80
  • PPV 3.5-5%
  • Notifying patients 4 years early?
  • No cure, why screen

minimal

acceptable?

Better outcomes

  • early anti-fibrotic therapy seems increasingly promising
  • better shot at lung transplant
  • early dx reduces  hospital-izations by a factor of 1-3

Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.

Clinical Trial Cohort Selection

Current screen failure rate ~50-60%

ZCoR boosted screen failure rate ~20%

Off-the-shelf AI does not suffice

lesson 2

Modeling Longitudinal  Patterns

Specialized HMM models from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.

ZeD Lab: Predictive Screening from Comorbidity Footprints

Nature Medicine

JAHA

CELL Reports

Science Adv.

1 in 59

Autism Spectrum Disorder

36

Autism Co-morbid Risk (ACoR) Score

Data: Onishchenko etal. Science Advances 2021

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

 

ZCoR:  ~87%

Application to Suicide Attempts and Ideation (SISA)  , PTSD*

perhaps surprising connection between mood disorders and physiological comorbidities

Gibbons RD, Kupfer D, Frank E, Moore T, Beiser DG, Boudreaux ED. Development of a Computerized Adaptive Test Suicide Scale-The CAT-SS. J Clin Psychiatry. 2017 Nov/Dec;78(9):1376-1382. doi: 10.4088/JCP.16m10922. PMID: 28493655.

* in press

The ZCoR Approch: Rapidly Re-targettable

ZED performance Competition
Autism >80% AUC at 2 yrs "obvious"
Alzheimer's Disease ~90% AUC  60-70% AUC
Idiopathic Pulmonary Fibrosis ~90% AUC NA
MACE ~80% AUC ~70% AUC 
Bipolar Disorder ~85% AUC NA
CKD ~85% AUC NA
Cancers (Prostate, Bladder, Uterus, Skin) ~75-80% AUC Low

Deploy all/many/most of these!

Application to Malignant Neoplasms

Melanoma

Melanoma has a high survival rate of over 90% when treated early. But if it progresses to later stages, the survival rate drops significantly. Identifying potentially life-threatening melanomas is crucial.

Medicine is poised to enter a transformative era, ushered by the emergence of sophisticated Artificial Intelligence (AI) models.

 

Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited

Uncovering A Digital Twin of the Maturing Human Microbiome 

ACT II

Sizemore, Nicholas, Kaitlyn Oliphant, Ruolin Zheng, Camilia R. Martin, Erika C. Claud, and Ishanu Chattopadhyay. "A digital twin of the infant microbiome to predict neurodevelopmental deficits." Science Advances 10, no. 15 (2024): eadj0400.

ishanu chattopadhyay

Nicholas Sizemore

Kaitlyn Oliphant

Erika Claud

THE PROBLEM

Can microbial assay from gut actionably

pre-empt developmental markers?

Assuming  a 1000 species ecosystem, and 1 successful experiment every day to discern a single two-way relationship, we would need 1,368 years to go through all possibilities. If we look for 3 way interactions, we would need 454,844 years

2019

PREEMPT

Can we predict the next pandemic?

Can we predict future mutations? Can we define the "edge of emergence"?

Digital Twins for complex systems

Chattopadhyay, Ishanu, Kevin Wu, Jin Li, and Aaron Esser-Kahn. "Emergenet: Fast Scalable Pandemic Risk Assessment of Influenza A Strains Circulating In Non-human Hosts." (2023). Under Review in Nature

PREEMPT

\Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

Q-Net

recursive forest

This is a general method!

Data

\(\downarrow \)

Set of interdependent

predictors

How do we measure "distance" between strains?

q-distance

a biologically informed, adaptive distance between strains

\theta(x,y) \triangleq \\ \mathbf{E}_i \left ( \mathbb{J}^{\frac{1}{2}} \left (\Phi_i^P(x_{-i}) , \Phi_i^Q(y_{-i})\right ) \right )

This distance is "special"

Smaller distances imply a quantitatively high probability of spontaneous jump

$$J \textrm{ is the Jensen-Shannon divergence }$$

Metric Structure

Tangent Bundle

geometry

dynamics

Sanov's Theorem & Pinsker's Inequality

Theorem

\left \vert \ln \frac{Pr(x \rightarrow y ) }{Pr( y \rightarrow y)} \right \vert \leqq \beta \theta(x,y)
\left \vert \ln \frac{Pr(x_a \rightarrow x_h ) }{Pr( x_h \rightarrow x_h)} \right \vert \approx 0 \\ \Rightarrow Pr(x_a \rightarrow x_h ) \approx Pr(x_h \rightarrow x_h ) \\ \color{green}\Rightarrow Pr(x_a \rightarrow x_h ) \approx 1

stable strain \(x_{h}\), "well-adapted" \(\Rightarrow Pr(x_h\rightarrow x_h) \approx 1 \)

For "new" strain \(x_{a}\),  \( \displaystyle \theta(x_{a},x_{h}) \approx 0 \)

Assume:

Then, we have:

we can tell if new strain will adapt to humans

A Math Solution to a Hard Biological Problem

\rho_t(x) \triangleq -\log \min_{y \in H^t} \sqrt{\theta_{\text{HA}}^{[t]}(x,y) \cdot \theta_{\text{NA}}^{[t]}(x,y)}

A Math Solution to a Hard Biological Problem

we can tell if new strain will adapt to humans

Influenza Risk Assessment Tool (IRAT) scoring for animal strains

Can we replicate IRAT scores*?

slow (months), quasi-subjective, expensive

*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm

genomic analysis

receptor binding

animal

transmission

antivirals available

population immunity

human infections

animal

hosts

global prevalence

antigenic novelty

disease severity

Influenza Risk Assessment Tool (IRAT) scoring for animal strains

slow (months), quasi-subjective, expensive

*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm

24 scores in 14 years

~10,000 strains collected annually

Emergenet:  finding emergence risk of animal strains

Emergenet time: 1 second

BioNorad

Stamping Out the Next Pandemic **Before** The First Human Infection

Lets go back to the Microbiome Problem

<class>_<observation_time>
<actinobacteria>_<30wk>
<clostridia>_<28wk>

construct qnet

\phi_{\textrm{typical}}

Q-net inferred with typical patients

\phi_{\textrm{deficit}}

Q-net inferred with patients with neurodevelopmental deficit

\psi

completely uninformative state

\psi^0

observed state

Think of microbiome profiles as states

\psi

completely uninformative state

observed

state

\phi_{\textrm{typical}}
\phi_{\textrm{deficit}}
\psi^0
\psi

Q-net inferred with typical patients

Q-net inferred with patients with neurodevelopmental deficit

Risk of Time-stamped Microbial Profile to lead to Developmental Deficit

Risk = \frac{\theta_{\textrm{typical}}(\psi,\psi^0)}{\theta_{\textrm{deficit}}(\psi,\psi^0)}

How different are the typical and deficit models?

Bacilli 30

typical 

deficit

Coriobacteria 32

typical 

deficit

Gammaproteobacteria 32

typical 

deficit

All Patients

Feeding Variables added

Ability to "fill in" missing data is equivalent to making trajectory forecasts

Our risk measure is highly predictive and actionable

Which entities are most predictive?

Just add those microbes back?

No transplantation is guaranteed to work reliably

Predicted to reduce

risk reliably

Predicted to reduce

risk reliably

Supplantation MUST be personalized

Supplantation MUST be personalized

Supplantation MUST be personalized

Network Interpretations?

Typical

Deficit

Future

Answer the question: "what is a healthy microbiome?"

 

Explicit supplantation profiles that are tuned to individual ecosystems

 

Bioreactor experiments

What other problems can it solve?

Q-Nets

Digital Twins for complex systems

Mental health diagnosis

opinion dynamics

algorithmic lie detector

VeRITaAS

Can A Generative AI Tell if you Are Lying?

Vetting Response Integrity from
cross-Talk in Adversarial
Surveys

Hidden structure of cross-talk between responses to interview items

PTSD diagnostic interview

Q-Net

Number of possible responses

10^{25}

Minimum Performance (n=624)

Average Time: 3.5 min

No. of questions: 20

AUC > 0.95

PPV > 0.86

NPV > 0.92

At least 83.3% sensitivity at 94% specificity

Minimum AUC = \(0.95 \pm 0.005\)

Cannot be coached, or memorized

Datasets for training & validation

1. VA (n=294)

2. Prolific (n=300)

3. Psychiatrists (n=30)

Beat the test!

200 participants in

US

100 participants in

UK

30 forensic psychiatrists

10

6

1

Can-You-Fake-PTSD Challenge Results

successful attempts

Future

Vision

  • Universal screening for IPF, ADRD, autism, rare cancers
  • Continuous monitoring of psychological health 
  • Reconfigurable Universal Screening (PCORI)
  • Bio-NORAD
  • Microbiome-based screening, Bioreactor experiments

Transform bio-surveillance

Transform modeling of complex systems

Transform early diagnosis

Democratize AI unleashing its power for social good

ishanu chattopadhyay

ishanu@uchicago.edu

BSL

By Ishanu Chattopadhyay

BSL

Predictive modeling of crime and rare phenomena using fractal nets

  • 135