talk

Third-wave AI in Medicine:

From Test-free Screening of Complex Diseases

Digital Twins of Microbiomes and Pandemics

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Chicago

ishanu@uchicago.edu

first wave

rule-based systems

second wave

Big Data / ML / Deep Learning

recognize patterns, make predictions, might improve over time, but struggle on tasks not trained for

third wave

contextual reasoning, generlizable, towards true intelligence

mathematics

computer science

social science

medicine

AI/ML learning theory and applications

Complex systems

Implication of AI in Future of Societay

University of Chicago Medicine

The Laboratory for Zero Knowledge Discovery

collaborators

Alex Leow

Psychiatry UIC

Anna Podolanczuk, Pulmonary Care, Weill Cornell

Gary Hunninghake, Pulmonary C, Harvard

Robert Gibbons, Bio-statistics

Daniel Rubins, Anesthesia and Critical Care

Peter Smith, Pediatrics

Michael Msall Pediatrics

Fernando Martinez, Pulmonary Critical Care, Weill Cornell

James Mastrianni, Neurology

James Evans, sociology

Erika Claud, Pediatrics

Aaron Esser-Kahn Molecular Engineering

David Llewellyn

University of Exeter

Kenneth Rockwood

Dalhousie University

Andrew Limper Mayo Clinic

zed.uchicago.edu

Department of Pediatrics

UChicago

Department of Neurology & The Memory Center

UChicago

Department of Psychiatry

UChicago

Pulmonary Critical Care, Weill Cornell

Department of Anesthesia and Critical Care

UChicago

Center for Health Statistics

UChicago

Pulmonary Critical Care, Harvard Medical School

Department of Psychiatry

UIC

Demon Network, Exeter, Alan Turing Institute, UK

Dalhousie University, Canada

Pritzker School of Molecular ENgineering

Social Science

UChicago

zed.uchicago.edu

Dr. Shahab Asoodeh
Dr. Yi Huang
Dmytro Onishenko
Victor Rotaru
Jin Li
Ruolin Zhang
David Yang

Dr. Nicholas Sizemore
Drew Vlasnik
Lucas Mantovani
Jaydeep Dhanoa
Jasmine Mithani
Angela Zhang
Warren Mo
Kevin Wu

zed.uchicago.edu

D3M (I2O)

PAI (DSO)

PREEMPT (BTO)

YFA (DSO)

NIA

Nature Medicine

Nature Human Behavior

Nature Commun-ication

Science Advances

(3)

PNAS

JAMA

JAHA

JACC

Publications

Impact

Research Direction 1

Point-of-care screening for complex diseases

Can we reliably screen for complex diseases such as pulmonary fibrosis, dementia and rare cancers?

Research Direction II

Digital Twins

General framework for inferring digital twins in biology and medicine

Electronic Healthcare Record

IPF

ASD

ADRD

Research Direction 1

"test-free" screening?

Autism
Idiopathic Pulmonary Fibrosis
Alzheimer's Disease and related dementia
Suicidality, PTSD
Perioperative Cardiac Event
Aggressive Melanoma
Uterine Cancer
Pancreatic Cancer

non-existent biomarkers

expensive, time-consuming diagnostic tests

Lack of Universal Screening at the point of care

Early diagnosis is difficult, late or missed diagnosis costs lives

We lack Universal Screening

for most diseases

Prognosis at Point-of-Diagnosis

Optimizing Management

Patient Journey

Continuous Risk Monitoring

Early Diagnosis

Universal Screening

Cohort Selection

Reduce screen failure rates

Holistic health surveillance

Predict antifibrotics continuation

improve outcomes

Interstitial Lung Disease / Pulmonary Fibrosis

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Flag patients before they (or doctors) suspect

Primary Care

Pulmonologist

Zero-burden Co-morbid Risk Score (ZCoR)

Referral

shortness of breath

dry cough

doctor can hear velcro crackles

Non-specific Symptoms

>50 years old

more men than women

IPF

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

Cannot always be seen on CXR

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

PCP workflow demands

Known Co-morbidities of PF

Are there more? Subtle footprints in the medical history that are more heterogeneous?

~ 4yrs

current survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML attempts to model the physician

AI in IPF Research

Co-morbidity patterns
No data demands
Use whatever data is already on patient file

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

prediction

target codes appear

Past medical history

No target codes appear

case

control

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

age > 50 years
at least two IPF target codes identified at least 1 month apart
chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
no claims for alternative ILD codes occurring on or after the first IPF claim

ICD Codes can be noisy

"cases" are not always true IPF

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

University of Chicago Medical Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

patient A

patient B

patient C

Beyond "risk factors" to personalized risk patterns

Clinical Trial Cohort Selection

Current screen failure rate ~50-60%

ZCoR boosted screen failure rate ~20%

cohort size: 2000

initial cohort size: 5000

initial cohort size with ZCoR: 2500

Cost per patient for confirmatory tests: ~7k USD

Savings: ~20M USD

Upto 4 year "signal" resolution

decreases risk

increases risk

Patient Journey: Tracking Risk over time

Autism

1 in 59

MCHAT/F

Alzheimer's Disease and Related Dementia*

* in press

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Preempting ADRD accurately upto a decade in future

Applicable To Screening for Mild Cognitive Impairment

Clinical Trial Participant Selection

Current screen-failure rate: 80-90%

Estimated rate with ZCoR:

40%

ZeD Lab: Predictive Screening from Comorbidity Footprints

CELL Reports

	ZCoR	Competition
Autism	>83%	"obvious"
Alzheimer's Disease	~90%	60-70%
Idiopathic Pulmonary Fibrosis	~90%	NA
MACE	~80%	~70%
Bipolar Disorder	~85%	NA
CKD	~85%	NA
Rare Cancers (Bladder, Uterus)	~75-80%	Low
Suicidality (with CAT-SS)	98% PPV	Low

Off-the-shelf AI does not suffice

How?

Odds ratios combined via ML

Data

cases

control

\vdots

odds ratios for all ICD codes

ML Model

odds-based risk estimator

0: \textrm{healthy}\\ 1: \textrm{infections}\\ 2: \textrm{other}

Probabilistic Finite State

Map health history to trinary streams

Chattopadhyay, Ishanu, and Hod Lipson. "Abductive learning of quantized stochastic processes with probabilistic finite automata." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 20110543.

Longitudinal stochastic patterns

PFSAs

from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.

Cloud Deployment

Theoretical formulation

Multi-cohort validation

Launch User-Accessible Platform

3 years

2 years

[
    {
        "patient_id": "P000038",
        "sex": "F",
        "birth_date": "01-01-2006",
        "DX_record": [
            {"date": "07-31-2006", "code": "Z38.00"},
            {"date": "08-07-2006", "code": "P59.9"},
            {"date": "08-29-2016", "code": "J01.90"},
            {"date": "09-10-2016", "code": "J01.90"},
            {"date": "11-14-2016", "code": "J01.91"}
        ],
        "RX_record": [
            {"date": "10-29-2011", "code": "rxLDA017"},
            {"date": "05-16-2015", "code": "rxIDG004"},
            {"date": "08-08-2015", "code": "rxIDG004"},
            {"date": "06-04-2016", "code": "rxIDD013"}
        ],
        "PROC_record": [
            {"date": "02-05-2007", "code": "90723"},
            {"date": "11-05-2007", "code": "J1100"}
        ]
    }
]

{
  "predictions": [
    {
      "error_code": "",
      "patient_id": "P000012",
      "predicted_risk": 0.005794344620009157,
      "probability": 0.8253881317184486
    }
  ],
  "target": "TARGET"
}

Data In

Data Out

Cohort Selection and Risk Analysis Testbed

https://paraknowledge.ai/zcor-testbed/

https://paraknowledge.ai/zcor-demo/

Misleading Diagnosis of Idiopathic Pulmonary Fibrosis: A Clinical Concern
Javier Ramos-Rossy, MD, Onix Cantres-Fonseca, MD, Ginger Arzon-Nieves, Yomayra Otero-Dominguez, MD, Stella Baez-Corujo, MD, and William Rodríguez-Cintrón, MD

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6248220/

Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited

Research Direction II

Digital Twins

General framework for inferring digital twins in biology and medicine

Chattopadhyay, Ishanu, Kevin Wu, Jin Li, and Aaron Esser-Kahn. "Emergenet: Fast Scalable Pandemic Risk Assessment of Influenza A Strains Circulating In Non-human Hosts." (2023). Under Review in Nature

PREEMPT

Predicting Future Mutations for Viral Genomes in the Wild

predict future emergence risk

\Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

Q-Net

recursive forest

q-distance

a biologically informed, adaptive distance between strains

\theta(x,y) \triangleq \\ \mathbf{E}_i \left ( \mathbb{J}^{\frac{1}{2}} \left (\Phi_i(x_{-i}) , \Phi_i(y_{-i})\right ) \right )

This distance is "special"

Smaller distances imply a quantitatively high probability of spontaneous jump

$$J \textrm{ is the Jensen-Shannon divergence }$$

Metric Structure

Tangent Bundle

geometry

dynamics

\theta(x,y) \sim Pr(x \rightarrow y)

\theta

Influenza Risk Assessment Tool (IRAT) scoring for animal strains

slow (months), quasi-subjective, expensive

*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm

24 scores in 14 years

~10,000 strains collected annually

CDC

Emergenet time: 1 second

Stamping Out the Next Pandemic **Before** The First Human Infection

BioNorad

THE PROBLEM

Assuming a 1000 species ecosystem, and 1 successful experiment every day to discern a single two-way relationship, we would need 1,368 years to go through all possibilities.

Digital Twin for the Maturing Human Microbiome

Forecast microbiome maturation trajectories

Predict neurodevelopmental deficits

Boston U

U Chicago

Two centers

Ability to "fill in" missing data is equivalent to making trajectory forecasts

predicting neurodevelopmental deficits

forecasting ecosystem trajectories

Which entities are most predictive

of neurodevelopmental deficit

entity X timestamp

SHAP value

No transplantation is guaranteed to work reliably

Just add those microbes back to reduce risk?

No!

Bacterial transplantation must be personalized

Future task:

Explicit supplantation profiles that are tuned to individual ecosystems

Mental health diagnosis

opinion dynamics

microbiome

viral emergence

Digital Twins for complex systems

algorithmic lie detector

teomims

Darkome

What other problems can it solve?

Phase 1

Phase 2

Lets give them:

1M patients clinical data diagnosed with ADRD/AD 60-80 years
1M African-American patients from Chicagoland
Open source - GNU public license

teomims

(open cohort)

licensed patient data

digital twin

(generative AI)

PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Algorithm for early diagnosis

Find Data for early prediction

Phase 1

Phase 2

Phase 1

Phase 2

Uncorrelated, yet indistinguishable !!

VeRITaAS

Can A Generative AI Tell if you Are Lying?

Vetting Response Integrity from
cross-Talk in Adversarial
Surveys

Hidden structure of cross-talk between responses to interview items

PTSD diagnostic interview

Q-Net

Number of possible responses

10^{25}

Minimum Performance (n=624)

Average Time: 3.5 min

No. of questions: 20

AUC > 0.95

PPV > 0.86

NPV > 0.92

At least 83.3% sensitivity at 94% specificity

Minimum AUC = $0.95 \pm 0.005$

Cannot be coached, or memorized

Datasets for training & validation

1. VA (n=294)

2. Prolific (n=300)

3. Psychiatrists (n=30)

Beat the test!

paraknowledge.ai/veritas

200 participants in

100 participants in

30 forensic psychiatrists

Can-You-Fake-PTSD Challenge Results

successful attempts

Future

Vision

Universal screening for IPF, ADRD, autism, rare cancers
Continuous monitoring of psychological health
Reconfigurable Universal Screening (PCORI)
Bio-NORAD
Microbiome-based screening, Bioreactor experiments

Transform bio-surveillance

Transform modeling of complex systems

Transform early diagnosis

Democratize AI unleashing its power for social good

ishanu chattopadhyay

ishanu@uchicago.edu

Impact on Popular Discourse on AI

National Pop-culture Discourse

Interviews, Op-eds, and Forum Appearences

Joe Rogan Podcast
Walter Isaacson Interview
Speaker on Pritzker Forum on Global Cities
>150 News articles written on published papers

Media Coverage

Rotaru, Victor, Yi Huang, Timmy Li, James Evans, and Ishanu Chattopadhyay. "Event-level prediction of urban crime reveals a signature of enforcement bias in US cities." Nature human behaviour 6, no. 8 (2022): 1056-1068.

How?

This distance is "special"

Q&A