UTAH

Third Wave AI in Medicine:

From Test-free Screening of Complex Diseases

Understanding Microbiome Self-organization and Zoonotic Emergence

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Chicago

ishanu@uchicago.edu

mathematics

computer science

social science

medicine

AI/ML learning theory and applications

Complex systems

Implication of AI in Future of Societay

University of Chicago Medicine

The Laboratory for Zero Knowledge Discovery

collaborators

Alex Leow

Psychiatry UIC

Anna Podolanczuk, Pulmonary Care, Weill Cornell

Gary Hunninghake, Pulmonary C, Harvard

Robert Gibbons, Bio-statistics

Daniel Rubins, Anesthesia and Critical Care

Peter Smith, Pediatrics

Michael Msall Pediatrics

Fernando Martinez, Pulmonary Critical Care, Weill Cornell

James Mastrianni, Neurology

James Evans, sociology

Erika Claud, Pediatrics

Aaron Esser-Kahn Molecular Engineering

David Llewellyn

University of Exeter

Kenneth Rockwood

Dalhousie University

Andrew Limper Mayo Clinic

zed.uchicago.edu

Dr. Shahab Asoodeh
Dr. Yi Huang
Dmytro Onishenko
Victor Rotaru
Jin Li
Ruolin Zhang
David Yang

Dr. Nicholas Sizemore
Drew Vlasnik
Lucas Mantovani
Jaydeep Dhanoa
Jasmine Mithani
Angela Zhang
Warren Mo
Kevin Wu

zed.uchicago.edu

Department of Pediatrics

UChicago

Department of Neurology & The Memory Center

UChicago

Department of Psychiatry

UChicago

Pulmonary Critical Care, Weill Cornell

Department of Anesthesia and Critical Care

UChicago

Center for Health Statistics

UChicago

Pulmonary Critical Care, Harvard Medical School

Department of Psychiatry

UIC

Demon Network, Exeter, Alan Turing Institute, UK

Dalhousie University, Canada

Pritzker School of Molecular ENgineering

Social Science

UChicago

zed.uchicago.edu

D3M (I2O)

PAI (DSO)

PREEMPT (BTO)

YFA (DSO)

NIA

ACT 1

Late or missed diagnosis of serious illnesses

Can we use existing EHR to reliably screen for complex diseases such as pulmonary fibrosis, dementia and rare cancers?

Electronic Healthcare Record

IPF

ASD

ADRD

Onishchenko, Dmytro, Robert J. Marlowe, Che G. Ngufor, Louis J. Faust, Andrew H. Limper, Gary M. Hunninghake, Fernando J. Martinez, and Ishanu Chattopadhyay. "Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records." Nature Medicine 28, no. 10 (2022): 2107-2116.

Universal screening for complex diseases

ACT II

Can We Model Ecosystems As They Evolve ?

Can we predict future mutations?

Digital Twins for complex systems

Nicholas Sizemore, Kaitlyn Oliphant, Ruolin Zheng, Camilia Martin, Erika Claud and Ishanu Chattopadhyay, A Digital
Twin of the Infant Microbiome to Predict Neurodevelopmental Deficits, Science Advances, 2024, In Press

Can we find generative models for microbiome dynamics?

Chattopadhyay, Ishanu, Kevin Wu, Jin Li, and Aaron Esser-Kahn. "Emergenet: Fast Scalable Pandemic Risk Assessment of Influenza A Strains Circulating In Non-human Hosts." (2023). Under Review in Nature

ACT I

The need for Universal Screening

Often the problem is not that diseases cannot be diagnosed by physicians, but one of missed or late diagnoses in the primary care workflow

Takes too long,

not supported by insurance,

"gut feeling" / "wait & see" common

Universal screening for many diseases are non-existant

Is AI/ML adding anything of relevance?

"predicting" autism > 3yrs

"diagnosing" fibrosis from lung imaging

"diagnosing" dementia from brain scan

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Flag patients before they (or doctors) suspect

Primary Care

Pulmonologist

Zero-burden Co-morbid Risk Score (ZCoR)

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

shortness of breath

dry cough

doctor can hear velcro crackles

Common Symptoms

>50 years old

more men than women

IPF

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

Cannot always be seen on CXR

Non-specific symptoms

PCP workflow demands

~ 4yrs

current survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML attempts to model the physician

AI in IPF Research

Co-morbidity Patterns
No data demands
Use whatever data is already on patient file

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

prediction

target codes appear

Past medical history

No target codes appear

case

control

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

age > 50 years
at least two IPF target codes identified at least 1 month apart
chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
no claims for alternative ILD codes occurring on or after the first IPF claim

ICD Codes can be noisy

"cases" are not always true IPF

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

Univesity of Chicago Medicam Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

performance tables

Marketscan Out-of-sample Results

specificty~99%

NPV>99.9%

IPF

ILD

performance tables

UCM Out-of-sample Results

specificty~99%

NPV>99.9%

IPF

ILD

Comorbidity Spectra

patient A

patient B

patient C

lesson 1

Beyond "risk factors" to personalized risk patterns

False Positives:

Heathcare Capacity

Ethics:

Risk from Imaging Tests

For every 20-30 flags,

1 is positive

General likelihood ratio 60-80
PPV 3.5-5%

Notifying patients 4 years early?

No cure, why screen

minimal

acceptable?

Better outcomes

early anti-fibrotic therapy seems increasingly promising

better shot at lung transplant

early dx reduces hospital-izations by a factor of 1-3

Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.

Clinical Trial Cohort Selection

Current screen failure rate ~50-60%

ZCoR boosted screen failure rate ~20%

Longitudinal history is important

lesson 2

Off-the-shelf AI does not suffice

lesson 3

Leveraging Longitudinal Patterns

Specialized HMM models from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

ZeD Lab: Predictive Screening from Comorbidity Footprints

Nature Medicine

JAHA

CELL Reports

Science Adv.

1 in 59

Autism Spectrum Disorder

ASD: Ineffective screening causes delays and incurs costs

Autism Co-morbid Risk (ACoR) Score

Data: Onishchenko etal. Science Advances 2021

Autism Co-morbid Risk (ACoR) Score

MCHAT/F

Head to head comparison with current practice

Data: Onishchenko etal. Science Advances 2021

Joint Operation with MCHAT

PPV=\frac{1}{1+\frac{1-c}{s}\left ( \frac{1}{p} -1 \right )}

CHOP Study allows us to see effectiveness of MCHAT in different sub-populations

Modulate sensitivity/specificity trade-offs

Data: Onishchenko etal. Science Advances 2021

The ZCoR Approch: Rapidly Re-targettable

	ZED performance	Competition
Autism	>80% AUC at 2 yrs	"obvious"
Alzheimer's Disease	~90% AUC	60-70% AUC
Idiopathic Pulmonary Fibrosis	~90% AUC	NA
MACE	~80% AUC	~70% AUC
Bipolar Disorder	~85% AUC	NA
CKD	~85% AUC	NA
Cancers (Prostate, Bladder, Uterus, Skin)	~75-80% AUC	Low

Deploy all/many/most of these!

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Preempting ADRD accurately upto a decade in future

Application to Suicide Attempts and Ideation (SISA) , PTSD*:

perhaps surprising connection between mood disorders and physiological comorbidities

Gibbons RD, Kupfer D, Frank E, Moore T, Beiser DG, Boudreaux ED. Development of a Computerized Adaptive Test Suicide Scale-The CAT-SS. J Clin Psychiatry. 2017 Nov/Dec;78(9):1376-1382. doi: 10.4088/JCP.16m10922. PMID: 28493655.

* in press

Application to Malignant Neoplasms*

Melanoma

Melanoma has a high survival rate of over 90% when treated early. But if it progresses to later stages, the survival rate drops significantly. Identifying potentially life-threatening melanomas is crucial.

* in press

Present state of medical advancements is poised to enter a transformative era, bolstered by the emergence of sophisticated Artificial Intelligence (AI) models.
Immense potential to reshape the realm of early disease diagnosis, prevention, and treatment strategies.
Accelerate scientific discovery towards deeper understanding of complex etiologies
Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited

Reading (References)

Onishchenko, Dmytro, Yi Huang, James van Horne, Peter J. Smith, Michael E. Msall, and Ishanu Chattopadhyay. “Reduced False Positives in Autism Screening via Digital Biomarkers Inferred from Deep Comorbidity Patterns.” Science Advances 7, no. 41 (October 8, 2021). https://doi.org/10.1126/sciadv.abf0354.

Onishchenko, Dmytro, Daniel S. Rubin, James R. van Horne, R. Parker Ward, and Ishanu Chattopadhyay. “Cardiac Comorbidity Risk Score: Zero‐Burden Machine Learning to Improve Prediction of Postoperative Major Adverse Cardiac Events in Hip and Knee Arthroplasty.” Journal of the American Heart Association 11, no. 15 (August 2, 2022). https://doi.org/10.1161/jaha.121.023745.

Onishchenko, Dmytro, Robert J. Marlowe, Che G. Ngufor, Louis J. Faust, Andrew H. Limper, Gary M. Hunninghake, Fernando J. Martinez, and Ishanu Chattopadhyay. “Screening for Idiopathic Pulmonary Fibrosis Using Comorbidity Signatures in Electronic Health Records.” Nature Medicine 28, no. 10 (September 29, 2022): 2107–16. https://doi.org/10.1038/s41591-022-02010-y.

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. “Sequence Likelihood Divergence for Fast Time Series Comparison.” Knowledge and Information Systems 65, no. 7 (March 16, 2023): 3079–98. https://doi.org/10.1007/s10115-023-01855-0.

Brenner, Lisa A., Lisa M. Betthauser, Molly Penzenik, Anne Germain, Jin Jun Li, Ishanu Chattopadhyay, Ellen Frank, David J. Kupfer, and Robert D. Gibbons. "Development and validation of computerized adaptive assessment tools for the measurement of posttraumatic stress disorder among US military veterans." JAMA Network Open 4, no. 7 (2021): e2115707-e2115707.

ACT II

ishanu chattopadhyay

Digital Twin of the Maturing Human Microbiome

Nicholas Sizemore

Kaitlyn Oliphant

Erika Claud

THE PROBLEM

Can microbial assay from gut actionably

pre-empt developmental markers?

Assuming a 1000 species ecosystem, and 1 successful experiment every day to discern a single two-way relationship, we would need 1,368 years to go through all possibilities. If we look for 3 way interactions, we would need 454,844 years

2019

PREEMPT

27 Million

Can we predict the next pandemic?

Can we predict future mutations? Can we define the "edge of emergence"?

Digital Twins for complex systems

PREEMPT

\Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

Q-Net

recursive forest

This is a general method!

Data

$\downarrow $

Set of interdependent

predictors

How do we measure "distance" between strains?

E-distance

a biologically informed, adaptive distance between strains

\theta(x,y) \triangleq \\ \mathbf{E}_i \left ( \mathbb{J}^{\frac{1}{2}} \left (\Phi_i^P(x_{-i}) , \Phi_i^Q(y_{-i})\right ) \right )

This distance is "special"

smaller distances imply a quatitatively high probability of spontaneous jump

$$J \textrm{ is the Jensen-Shannon divergence }$$

Sanov's Theorem & Pinsker's Inequality

Theorem

\left \vert \ln \frac{Pr(x \rightarrow y ) }{Pr( y \rightarrow y)} \right \vert \leqq \beta \theta(x,y)

\left \vert \ln \frac{Pr(x_a \rightarrow x_h ) }{Pr( x_h \rightarrow x_h)} \right \vert \approx 0 \\ \Rightarrow Pr(x_a \rightarrow x_h ) \approx Pr(x_h \rightarrow x_h ) \\ \color{green}\Rightarrow Pr(x_a \rightarrow x_h ) \approx 1

stable strain $x_{h}$, "well-adapted" $\Rightarrow Pr(x_h\rightarrow x_h) \approx 1 $

For "new" strain $x_{a}$, $ \displaystyle \theta(x_{a},x_{h}) \approx 0 $

Assume:

Then, we have:

we can tell if new strain will adapt to humans

A Math Solution to a Hard Biological Problem

Influenza Risk Assessment Tool (IRAT) scoring for animal strains

Can we replicate IRAT scores*?

slow (months), quasi-subjective, expensive

*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm

genomic analysis

receptor binding

animal

transmission

antivirals available

population immunity

human infections

animal

hosts

global prevalence

antigenic novelty

disease severity

Influenza Risk Assessment Tool (IRAT) scoring for animal strains

slow (months), quasi-subjective, expensive

*https://www.cdc.gov/flu/pandemic-resources/monitoring/irat-virus-summaries.htm

24 scores in 14 years

~10,000 strains collected annually

Emergenet: finding emergence risk of animal strains

Emergenet time: 1 second

BioNorad

Lets go back to the Microbiome Problem

<class>_<observation_time>

<actinobacteria>_<30wk>

<clostridia>_<28wk>

construct qnet

Sanov's Theorem & Pinsker's Inequality

Theorem

\left \vert \ln \frac{Pr(x \rightarrow y ) }{Pr( y \rightarrow y)} \right \vert \leqq \beta \theta(x,y)

stable profile $x_{h}$, "well-adapted" $\Rightarrow Pr(x_h\rightarrow x_h) \approx 1 $

For "new" profile $x_{a}$, $ \displaystyle \theta(x_{a},x_{h}) \approx 0 $

Assume:

Then, we have:

we can tell if new profile will be stable

A Math Solution to a Hard Biological Problem

current state:

also all "future" values for a sample would be missing

typically sparse, lots of missing data

class_time	abundance level
Actionobacteria_28	a
Actionobacteria_29	-
Actionobacteria_30	b
. . .
Clostridia_28	g
. . .	-
Bacilli_28	d
. . .
Gammaproteobacteria_28	e
. . .	-
Coriobacteriia_28	w

missing

Biology-aware Perturbations to "reconstruct" missing data

Can i meaningfully perturb abundance values?

Can we fill them in if they are missing?

\textrm{Q-net}\\ \Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

sample

reconstructed observation

\textrm{Q-net}\\ \Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

sample

unknown

current state

\textrm{Q-net}\\ \Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

"collapse"

\textrm{Q-net}\\ \Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

sample

unknown

current state

\textrm{Q-net}\\ \Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

"collapse"

completely uninformative state

No information available for this sample yet

completely uninformative state

\textrm{Q-net}\\ \Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

No information available for this sample yet

\psi^0

\psi

completely uninformative state

observed

state

\phi_{\textrm{typical}}

\phi_{\textrm{deficit}}

\psi^0

\psi

Q-net inferred with typical patients

Q-net inferred with patients with neurodevelopmental deficit

\psi

completely uninformative state

observed

state

\phi_{\textrm{typical}}

\phi_{\textrm{deficit}}

\psi^0

\psi

Q-net inferred with typical patients

Q-net inferred with patients with neurodevelopmental deficit

Risk of Time-stamped Microbial Profile to lead to Developmental Deficit

Risk(\psi) = \frac{\theta_{\textrm{typical}}(\psi,\psi^0)}{\theta_{\textrm{deficit}}(\psi,\psi^0)}

smaller the q-distance,

higher the likelihood of a jump

How different are the typical and deficit models?

Actinobacteria 30

typical

deficit

Bacilli 30

typical

deficit

Bacteroidia 30

typical

deficit

Coriobacteria 32

typical

deficit

Gammaproteobacteria 32

typical

deficit

typical

deficit

All Patients

Feeding Variables added

Ability to "fill in" missing data is equivalent to making trajectory forecasts

Our risk measure is highly predictive and actionable

Which entities are most predictive?

Just add those microbes back?

No transplantation is guaranteed to work reliably

Predicted to reduce

risk reliably

Predicted to reduce

risk reliably

Supplantation MUST be personalized

Supplantation MUST be personalized

Supplantation MUST be personalized

Network Interpretations?

Typical

Deficit

Effect of Clinical Variables

Future

Concretely answer the question: "what is a healthy microbiome?"

Explicit supplantation profiles that are tuned to individual ecosystems

Bioreactor experiments

What other problems can it solve?

Q-Nets

Digital Twins for complex systems

Mental health diagnosis

opinion dynamics

algorithmic lie detector

YFA 2020

Yang, David, James EVans, and Ishanu Chattopadhyay. "‘Its the Economy Stupid’: Predictive Theory of Belief Shift Connecting Economic Stress to Societal Polarization." (under review Nature Human Behavior).

predict worldviews from incomplete data

VeRITaAS

Can A Generative AI Tell if you Are Lying?

Vetting Response Integrity from
cross-Talk in Adversarial
Surveys

Hidden structure of cross-talk between responses to interview items

PTSD diagnostic interview

Q-Net

VeRITAS

High Complexity

Low Surprise

Responses that are reflective of symptoms

structured interview

properties of true responses

Minimum AUC = $0.95 \pm 0.005$

Cannot be coached, or memorized

Number of possible responses

10^{25}

Minimum Performance (n=624)

Average Time: 3.5 min

No. of Items: 20

AUC > 0.95

PPV > 0.86

NPV > 0.92

At least 83.3% sensitivity at 94% specificity

Beat the test!

paraknowledge.ai/veritas

Future

Vision

Universal screening fro IPF, ADRD, autism, rare cancers
Continuous monitoring of psychological health
Reconfigurable Universal Screening (PCORI)
Bio-NORAD
Microbiome-based screening, Bioreactor experiments

Transform bio-surveillance

Transform modeling of complex systems

Transform early diagnosis

Democratize AI unleashing its power for social good

Why are ML/AI models complicated, and non-transparent?

What is Data?

shallow
mechanically gathered
systematic record of information

individual data points not so much important

Tyco Brahe

(1546-1601)

Johannes Keplar (1571-1630)

Newtonian theory of Universal Gravitation (1684)

raw data

empirical fit

universal law of physics

30,000 experiments

Starting point of modern genetics

Mendel's Laws of Genetics

Johann Gregor Mendel (1822–1884)

Is this Big data?

Big data?

Some datasets are large, but simple: easily compressible or representable

Others, are not.

intrinsic complexity
not representable by simple rules of generation

"big data" has irreducible complexity

Hence, "models" must have capacity to accommodate this complexity

Machine Learning and AI allows us to find "theories" which are no longer specifiable as simple equations,

but require

billions of parameters to specify

Medical history

co-morbidities

lifestyle

genetics

environment

Estimate disease risk

Estimate prognosis

Reduce missed and delayed diagnosis

Find prodromal patients for clinical trials

The Age of Data

Autism Spectrum Disorder + AI

Idiopathic Pulmonary Fibrosis + AI

Literature Search: AI + Target Disease

Current AI Applications are limited in practice

Are ML predictions pertaining to clinical diagnoses adding anything of relevance?

"predicting" autism > 3yrs
"predicting" autism with detailed videos on toddler behavior
"diagnosing" lung disease from lung imaging
"diagnosing" Alzheimer's Disease or cognitive disorder from detailed brain scan

Risk

The Key Stumbling Block: Features

How to find good features?

Good features

relevant risk factors

Leverage Vast Patient EHR and Insurance Claims Database(s)

Truven MarketScan (IBM)
Commerical Claims & Encounters Database

2003-2018

87M patients visible > 1 year

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

Cloud Deployment

[
    {
        "patient_id": "P000038",
        "sex": "F",
        "birth_date": "01-01-2006",
        "DX_record": [
            {"date": "07-31-2006", "code": "Z38.00"},
            {"date": "08-07-2006", "code": "P59.9"},
            {"date": "08-29-2016", "code": "J01.90"},
            {"date": "09-10-2016", "code": "J01.90"},
            {"date": "11-14-2016", "code": "J01.91"}
        ],
        "RX_record": [
            {"date": "10-29-2011", "code": "rxLDA017"},
            {"date": "05-16-2015", "code": "rxIDG004"},
            {"date": "08-08-2015", "code": "rxIDG004"},
            {"date": "06-04-2016", "code": "rxIDD013"}
        ],
        "PROC_record": [
            {"date": "02-05-2007", "code": "90723"},
            {"date": "11-05-2007", "code": "J1100"}
        ]
    }
]

{
  "predictions": [
    {
      "error_code": "",
      "patient_id": "P000012",
      "predicted_risk": 0.005794344620009157,
      "probability": 0.8253881317184486
    }
  ],
  "target": "TARGET"
}

Data In

Data Out

The Paraknowledge API

curl -X POST -H "Content-Type: application/json" -d '[{"patient_id": "P28109965201", "sex": "M", "age": 89, "fips": "35644", "DX_record": [{"date": "12-16-2011", "code": "R09.02"}, {"date": "12-30-2011", "code": "H04.129"}, {"date": "12-30-2011", "code": "H02.109"}], "RX_record": [], "PROC_record": [{"date": "09-28-2012", "code": "71100"}]}]' "https://us-central1-pkcsaas-01.cloudfunctions.net/zcor_predict?target=IPF&api_key=7eea9f70d79c408f2b69847d911303c"

Current Targets

IPF
ILD
ADRD
CKD
CKD_SEVERE
MELANOMA
CANCER_PANCREAS
CANCER_UTERUS
SISA

Cohort Selection and Risk Analysis Testbed

https://paraknowledge.ai/zcor-testbed/

https://paraknowledge.ai/zcor-demo/

Cohort Selection and Risk Analysis Testbed

https://paraknowledge.ai/zcor-testbed/

https://paraknowledge.ai/zcor-demo/

melanoma+dementia

Baseline prevalence of IPF in ILD patients

~25%

ZCoR PPV: 60% @ 50% sensitivity

1310 positive patients from 2183 flags

screen failure:

~70% $\rightarrow$ 40%

Selection comparison against baseline of 2+ ILD risk factors

baseline prevalence: ~2%

projected screen failure:

~98% baseline $\rightarrow$ 45%

Patient Journeys for IPF: Tracking increasing Risk Over Time

Upto 4 year "signal" resolution

patient journey

Other Examples

decreases risk

increases risk

Risk decreases sometimes

new codes change trajectory as they are revealed

Delving Deeper into Learning Goals

Early screening of complex diseases by leveraging deep pattern discovery in history of medical encounters
Use AI to transform the landscape of early disease diagnosis, prevention, and treatment strategies for complex medical conditions.
Realize universal primary care low-burden screening for disorders for which potentially no recommended screening tools exist currently
Generalize beyond known “risk factors”, uncover personalized predictors of future risk of serious diseases from subtle comorbidity signatures

Problem: Event-level prediction in social systems,

e.g. predicting crime before it happens

Predictive intelligence for security

Can we predict complex spatio-temporal stochastic processes?

Rotaru, Victor, Yi Huang, Timmy Li, James Evans, and Ishanu Chattopadhyay. "Event-level prediction of urban crime reveals a signature of enforcement bias in US cities." Nature human behaviour 6, no. 8 (2022): 1056-1068.

Problem: Can AI predict how we think and interact?

Can we predict how opinions evolve?

Digital Twins for complex systems

YFA 2020

Can an AI tell if you are lying?

Can an AI tell how you are going to vote?

Yang, David, James EVans, and Ishanu Chattopadhyay. "‘Its the Economy Stupid’: Predictive Theory of Belief Shift Connecting Economic Stress to Societal Polarization." (2023).

Learning Objectives

What is AI/Machine Learning? What are the key application in the context of medicine? What does it bring to the table in the context of Health Services and Bio-medicine? Are there new questions that we can answer? Does it suffice to draw on off-the-shelf models? What are the new/emerging ideas?

Application of AI in Biomedicine: Why We Need a “Bio”-AI.
Emerging tools for addressing Late and Missed Diagnosis in Primary Care
Why “risk factors” are often not predictive enough, and how to think about more personalized predictors of future risk of serious diseases

Zero-burden EHR Analytics

Diagnostic & Screening for complex disorders

*CoR : * Comorbid Risk Scores

ACoR (Autism)

PCoR (IPF/ILD)

ZCoR (ADRD/AD)

ZCoR-C (cancers with further specialization)

Three parameters

\kappa

\nu

\mu

Kolmogorov complexity

Surprise

Naive diagnostic Risk

304 VA participants with physician validated PTSD (malingering possibility not considered?)

310 online participants with no mental health diagnosis asked to intentionally malinger

~5% successfully beat the test

~89% of PTSD positive patients pass the test

Substance Abuse Disorder

\kappa

\nu

malingering

SUD

No SUD

Cook County Data

Estimated malingering rate 0.34

650 BCE

1792

1890

1943

1956

2006

2011

2020

2021

2022

Babylonian astrology for prediction

National Weather Service

John McCarthy coins "Artificial Intelligence."

IBM's Watson wins Jeopardy!

AI begins to outperform in healthcare diagnostics

Old Farmer's Almanac first published

McCulloch Pitts' neural network

Deep learning by Geoffrey Hinton

GPT-3

GPT-4 | Dall-e | AI reaching critical mass