Machine Learning for Biomedicine

CCTS 45200

Ishanu Chattopadhyay

Assitant Professor, Medicine

07.24.2023

Study Questions

1. What is Machine Learning? What are the key application in the context of medicine?

2. What does it bring to the table in the context of Health Services and Bio-medicine? Are there new questions that we can answer?

3. Does it suffice to draw on off-the-shelf models?

4. What are the new/emerging ideas?

HW

1. What is Machine Learning? What are the key application in the context of medicine? Which of teh reading list are more "ML" and which are more "statistical'?

2. What does it bring to the table in the context of Health Services and Bio-medicine? Are there new questions that we can answer?

3. Does it always suffice to draw on off-the-shelf models? Why or why not?

4. What are some new/emerging ideas beyond image classification and "drug discovery" in AI applications in medicine?

Reading

Dmytro Onishchenko, Robert J. Marlowe, Che G. Ngufor, Louis J. Faust, Andrew H. Limper, Gary M. Hunninghake, Fernando J. Martinez, Ishanu Chattopadhyay, "Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records", Nature Medicine, sep, 2022.

Dmytro Onishchenko, Daniel S. Rubin, James R. van Horne, R. Parker Ward, Ishanu Chattopadhyay 65;6602;1c, "Cardiac Comorbidity Risk Score: Zero Burden Machine Learning to Improve Prediction of Postoperative Major Adverse Cardiac Events in Hip and Knee Arthroplasty", Journal of the American Heart Association, vol. 11, no. 15, pp. e023745, 2022.

Dmytro Onishchenko, Yi Huang, James van Horne, Peter J. Smith, Michael E. Msall, Ishanu Chattopadhyay, "Reduced false positives in autism screening via digital biomarkers inferred from deep comorbidity patterns", Science Advances, vol. 7, no. 41, oct, 2021.

Robert D. Gibbons, Ishanu Chattopadhyay, Herbert Y. Meltzer, John M. Kane, Daniel Guinart, "Development of a computerized adaptive diagnostic screening tool for psychosis", Schizophrenia Research, vol. 245, pp. 116–121, jul, 2022.

zed.uchicago.edu

Department of Pediatrics

UChicago

Department of Neurology & The Memory Center

UChicago

Department of Psychiatry

UChicago

Pulmonary Critical Care, Weill Cornell

Department of Anesthesia and Critical Care

UChicago

Center for Health Statistics

UChicago

Pulmonary Critical Care, Harvard Medical School

Department of Psychiatry

UIC

Demon Network, Exeter, Alan Turing Institute, UK

Dalhousie University, Canada

Pritzker School of Molecular ENgineering

Social Science

UChicago

Zero-burden EHR Analytics

Diagnostic & Screening for complex disorders

*CoR : * Comorbid Risk Scores

ACoR

PCoR

ZCoR

Universality

Autism

Bipolar Disorder

Idiopathic Pulmonary Fibrosis

Alzheimer's Disease

Perioperative Cardiac Event

Chronic Kidney Disease

...

complex, expensive, time-consuming diagnostic tests

Lack of Universal Screening at the point of care

Early diagnosis is difficult, late or missed diagnosis costs lives

Leverage Vast Patient Database

Truven MarketScan (IBM)
Commerical Claims & Encounters Database

2003-2018

87M patients visible > 1 year

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

Big Data

?

"Big"

Size

>

What is Data?

shallow
mechanically gathered
systematic record of information

What is Data?

shallow
mechanically gathered
systematic record of information

individual data points not so much important

Tabularia in ancient Rome – storehouses of receipts from individual purchases that gave the Romans vision into the state of commerce.

(78 B.C.)

Tyco Brahe

(1546-1601)

Johannes Keplar (1571-1630)

Newtonian theory of Universal Gravitation (1684)

30,000 experiments

Starting point of modern genetics

Mendel's Laws of Genetics

Johann Gregor Mendel (1822–1884)

Is this Big data?

Big data?

Some datasets are large, but simple: easily compressible or representable

Others, are not.

Big data?

intrinsic complexity
not representable by simple rules of generation

The purpose of Data

Document phenomena precisely
Predict / forecast / make decisions
Basis of scientific hypotheses

scientific theory

Nature of scientific theory

Equations (principles)
Predictive / falsifiable
"Understandable"

Scientific theories are almost always simple!

The Scientific Method

How do we make this work for data too "big" for our "small" minds?

Hypothesize broad principles
Allow for the data complexity to unfold

some black boxes

allowed!

What if not all phenomena admit simple explanations?

Complex Phenomena

&

Answers we seek in them*

*Almost all questions in biology and social systems

Pandemics

Emergent Pathogens

Social Dynamics

Complex Diseases

Data

Forecast case count

Predict future mutations

Predict outcomes

Diagnose/screen diseases

"big data" has irreducible complexity

Hence, "models" must have capacity to accommodate this complexity

Machine Learning

& AI

Medical history

co-morbidities

lifestyle

genetics

environment

Estimate disease risk

Estimate prognosis

Reduce missed and delayed diagnosis

Find prodromal patients for clinical trials

The Age of Data

Risk

How is this different from Random Control Trials?

Machine Learning is poised to transform clinical discovery and outcome research

Cohort size & Composition

Scope of discovery

Translation to practice

Less well understood/reliable

smaller cohorts in RCT

narrow scope in RCT

"controlled" experiments

statistical rigor

Autism Spectrum Disorder + AI

Idiopathic Pulmonary Fibrosis + AI

Literature Search: AI + Target Disease

Are ML predictions pertaining to clinical diagnoses adding anything of relevance?

"predicting" autism > 3yrs
"predicting" autism with detailed videos on toddler behavior
"diagnosing" lung disease from lung imaging
"diagnosing" Alzheimer's Disease or cognitive disorder from detailed brain scan

Risk

The Key Stumbling Block: Features

How to find good features?

Good features

relevant risk factors

Must do pattern discovery

Discover factors that modulate risk, beyond what is already known

Must account for the possibility of non-causal spurious associations

Lesson

The need for Universal Screening

Often the problem is not that diseases cannot be diagnosed by physicians, but one of missed or late diagnoses in the primary care workflow
Universal screening for many diseases are non-existant
Tools that exist often yield "obvious" results

Takes too long,

not supported by insurance,

"gut feeling" / "wait & see" common

IPF diagnosed from lung imaging using CNN

Alzheimer's diagnosed from brain scan

Autism diagnosed by "AI" after 3 years

Good for writing papers, not clinically useful

1 in 59

Autism Spectrum Disorder

ASD: Ineffective screening causes delays and incurs costs

Autistic children experience higher co-morbidities

Can we exploit these patterns to predict diagnosis?

Common Knowledge: Comorbidties Exist

Autism Co-morbid Risk (ACoR) Score

MCHAT/F

Head to head comparison with current practice

Autism Co-morbid Risk (ACoR) Score

Importance of different comorbidity categories

Feature types:

sequence likelihood
sequence likelihood defect
Proportion of specific categories
other sequence measures

17 categories chosen:

immune | infections | endocrine | ...

Joint Operation with MCHAT

PPV=\frac{1}{1+\frac{1-c}{s}\left ( \frac{1}{p} -1 \right )}

CHOP Study allows us to see effectiveness of MCHAT in different sub-populations

Modulate sensitivity/specificity trade-offs

Ishanu Chattopadhyay

Assistant Professor of Medicine

UChicago

Dmytro Onishchenko

UChicago

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

University of Chicago Medicine

NHLBI IPF Stakeholder Summit

Nov 2022

Fernando Martinez, Weill Cornell

Gary Hunninghake

Harvard Med School

Andrew Limper Mayo Clinic

shortness of breath

dry cough

doctor can hear velcro crackles

Common Symptoms

>50 years old

more men than women

IPF

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

Cannot always be seen on CXR

Non-specific symptoms

PCP workflow demands

~ 4yrs

current survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML attempts to model the physician

AI in IPF Research

Co-morbidity Patterns
No data demands
Use whatever data is already on patient file

Co-morbidity Patterns
No data demands
Use whatever data is already on patient file

Primary Care

Pulmonologist

ZCoR Flag

No blood tests
No imaging
No pulmonary function tests

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

target codes appear

Past medical history

No target codes appear

case

control

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

age > 50 years
at least two IPF target codes identified at least 1 month apart
chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
no claims for alternative ILD codes occurring on or after the first IPF claim

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

Univesity of Chicago Medicam Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients

2,983,215 patients

performance tables

Marketscan Out-of-sample Results

specificty~99%

NPV>99.9%

IPF

ILD

performance tables

UCM Out-of-sample Results

specificty~99%

NPV>99.9%

IPF

ILD

False Positives:

Heathcare Capacity

Ethics:

Risk from Imaging Tests

For every 20-30 flags,

1 is positive

General likelihood ratio 60-80
PPV 3.5-5%

Notifying patients 4 years early?

No cure, why screen

minimal

acceptable?

Better outcomes

early anti-fibrotic therapy seems increasingly promising

better shot at lung transplant

early dx reduces hospital-izations by a factor of 1-3

Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.

Alzheimer's Disease and Related Dementia

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Preempting ADRD accurately upto a decade in future

Applicable To Screening for Mild Cognitive Impairment

Clinical Trial Participant Selection

Current screen-failure rate: 80-90%

Estimated rate with ZCoR:

40%

The Secret Sauce: Inferring Probabilistic Machines from Data

Deep Learning Without Neural Networks: Fractal-nets for Rare Event Modeling (Under Review Nature Machine Intelligence)

Yi Huang, James Evans, I. Chattopadhyay

Sequence Likelihood Divergence For Fast Time Series Comparison

Yi Huang, Victor Rotaru, I. Chattopadhyay

Under Review IEEE Transactions of Data and Knowledge Engineering

Abductive learning of quantized stochastic processes with probabilistic finite automata

Ishanu Chattopadhyay and Hod Lipson

2013 Phil. Trans. R. Soc. A.3712011054320110543

The Secret Sauce: Inferring Probabilistic Machines from Data

Immune female control

Immune female case

The Secret Sauce: Inferring Probabilistic Machines from Data

Endocrine female control

Endocrine female case

The Secret Sauce: Inferring Probabilistic Machines from Data

Cardiovascular female control

Cardiovascular female case

Secret Sauce: Leverging Temporal Patterns

Specialized HMM models from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

AI

Equity

&

Fairness

If we have time:

Ian Cero, Peter A. Wyman, I. Chattopadhyay, Robert D. Gibbons, Predictive equity in suicide risk screening, Journal of the Academy of Consultation-Liaison Psychiatry, 2023. https://doi.org/10.1016/j.jaclp.2023.03.005

AI

Equity

&

Fairness

Ian Cero

Peter Wyman

Robert Gibbons

Suicide is a major public health concern

1 death by suicide every 40 seconds

As per the data from the CDC, in 2019, there were over 47,500 suicide deaths in the U.S., with an age-adjusted rate of 13.9 per 100,000 individuals.

10th leading cause of death in the United States

Screening Tests are Increasingly common

Columbia-Suicide Severity Rating Scale (C-SSRS)

Patient Health Questionnaire-9 (PHQ-9)

Ask Suicide-Screening Questions (ASQ)

These screening tools are not meant to be diagnostic but rather to help identify individuals who may need further evaluation or intervention to prevent suicide.

Primary Care

Emergency Dept

School & Community

Screening Tests are Increasingly common

Columbia-Suicide Severity Rating Scale (C-SSRS)

Patient Health Questionnaire-9 (PHQ-9)

Ask Suicide-Screening Questions (ASQ)

Primary Care

Emergency Dept

School & Community

Coley RY, Johnson E, Simon GE, Cruz M, Shortreed SM. Racial/Ethnic Disparities in the Performance of Prediction Models for Death by Suicide After Mental Health Visits. JAMA Psychiatry. 2021 Jul 1;78(7):726–34.

The increasing standardization of suicide risk screening suggests predictive models balance not only accuracy, but also fairness for the different groups of people whose futures are being predicted

Accuracy

Fairness

Group A

Group B

Coley RY, Johnson E, Simon GE, Cruz M, Shortreed SM. Racial/Ethnic Disparities in the Performance of Prediction Models for Death by Suicide After Mental Health Visits. JAMA Psychiatry. 2021 Jul 1;78(7):726–34.

Ask Suicide-Screening Questions (ASQ) has high and equivalent sensitivity and specificity for suicide ideation across black and white youth in the emergency department.

Black

Sensitivity

Specificity

Non-Hispanic White

Equal across groups

ASQ

Different Base rates (prevalence)

6.11 per 100,000*

15.68 per 100,000*

Non-Hispanic White

Black

*CDC 2019 Data

Uneven base rates

Mathematically unavoidable trade-off between model accuracy and fairness

Another Example: criminal recidivism

ProPublica recently analyzed over 10,000 of the actual predictions from a popular recidivism prediction model (COMPAS)

Black defendants were twice as likely as white defendants to receive a false positive classification

Creators of COMPAS presented equally compelling findings

model’s overall classification accuracy (about 64%) was in fact equal for both black and white defendants

Larson J, Mattu S, Kirchner L, Angwin J. How We Analyzed the COMPAS Recidivism Algorithm [Internet]. ProPublica. 2016 [cited 2022 Dec 30]. Available from: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

UNLIKELY due to "biased data", or model

*Kleinberg J, Mullainathan S, Raghavan M. Inherent Trade-Offs in the Fair Determination of Risk Scores. ArXiv160905807 Cs Stat [Internet]. 2016 Nov 17 [cited 2019 Nov 6]; Available from: http://arxiv.org/abs/1609.05807

Predictive disparity is likely caused by uneven base rates on the outcome being predicted*

Confusion Matrix with 2 classes

Common Performance Metrics

Relationships between Performance Metrics

TPR = \frac{t_p}{P} = \frac{t_p}{t_p+f_n}\\ TNR = \frac{t_n}{N} = \frac{t_n}{t_n+f_p}\\ FPR =1-TNR\\ PPV =\frac{t_p}{t_p+f_p}\\ \rho =\frac{P}{N+P}

t_p : \textrm{ true positives }, t_n: \textrm{ true negatives }

f_p : \textrm{ false positives }, f_n: \textrm{ false negatives }

sensitivity

specificity

precision

prevalence

Relationships between Performance Metrics

PPV = \frac{t_p/P}{t_p/P + (f_p/N)(N/P)} = \frac{TPR}{\rho + ((N-t_n)/N)(N/P)}

t_p : \textrm{ true positives }, t_n: \textrm{ true negatives }

f_p : \textrm{ false positives }, f_n: \textrm{ false negatives }

s : \textrm{ sensitivity }, c: \textrm{ specificity }

NPV = \frac{1}{1+ \frac{1-s}{c \left ( \frac{1}{\rho}-1\right )} }

PPV = \frac{s}{s + (1-c)(\frac{1}{\rho} -1)}

Relationships between Performance Metrics

PPV = \frac{t_p/P}{t_p/P + (f_p/N)(N/P)} = \frac{TPR}{\rho + ((N-t_n)/N)(N/P)}

t_p : \textrm{ true positives }, t_n: \textrm{ true negatives }

f_p : \textrm{ false positives }, f_n: \textrm{ false negatives }

s : \textrm{ sensitivity }, c: \textrm{ specificity }

NPV = \frac{1}{1+ \frac{1-s}{c \left ( \frac{1}{\red \rho}-1\right )} }

PPV = \frac{s}{s + (1-c)(\frac{1}{\red \rho} -1)}

prevalence is intrinsic property of the disease

Relationships between Performance Metrics

NPV = \frac{1}{1+ \frac{1-s}{c \left ( \frac{1}{\red \rho}-1\right )} }

PPV = \frac{s}{s + (1-c)(\frac{1}{\red \rho} -1)}

Manic Episode with no Bipolar history

prevalence: ~10%

Relationships between Performance Metrics

NPV = \frac{1}{1+ \frac{1-s}{c \left ( \frac{1}{\red \rho}-1\right )} }

PPV = \frac{s}{s + (1-c)(\frac{1}{\red \rho} -1)}

Idiopathic Pulmonary Fibrosis

prevalence: ~0.5%

The decision threshold is upto us to decide

Impacts sensitivity & specificity

Sensitivity Specificity Tradeoff

Each choice of a threshold produces a different test

UCM Data

Blacks

Non-Hispanic Whites

AUC~90%

AUC~88%

Universal SCreening for Suicidal Ideation / Attempts

UCM Data

Universal SCreening for Suicidal Ideation / Attempts

UCM Data

Universal SCreening for Suicidal Ideation / Attempts

UCM Data

Universal SCreening for Suicidal Ideation / Attempts

$466,700

$135,700

15

Assume you have $1,000,000 to allocate to the post-screening followup service

67%

33%

25

Number of actual individuals helped

Demographic breakdown at UCM

=40

9

Assume you have $1,000,000 to allocate to the post-screening followup service

44%

66%

49

Number of actual individuals helped

Demographic breakdown at UCM

+

Differential

base

rate

=58

Race-blind followup

21

Assume you have $1,000,000 to allocate to the post-screening followup service

100%

0%

0

Number of actual individuals helped

=21

17

Assume you have $1,000,000 to allocate to the post-screening followup service

77.5%

22.5%

17

Number of actual individuals helped

Equal outcome

allocation

=34

No blood tests, no questionnaires, just diagnostic codes.

Instantaneous Universal Screening at Primary Care.

Works even for patients without history of mental disorders.

Screening

Posterior odds of SI/SA
in flagged population:

13 in 20

Prior odds of SI/SA
in general population:

1 in 20

3 out of 13 true flags have no prior history of mental disorders

The Screening Test is at its performance limit

The Ethics Question

Distribute resources race-blind

Distribute resources to make equal outcomes

Lives saved

58

34

The new frontier of predictive fairness in suicide prediction

Large scale and prospectively designed studies are needed to investigate the full scope of the problem and optimal alternatives, considering not only traditional cost measures but also screening mistakes and community stakeholders' preferences.
Suicide prevention research can be informed by progress in algorithmic fairness, such as predictive models constrained by a fairness budget and survey methods to elicit desired fairness trade-offs from community members.
New best practices in predictive modeling of suicide risk should include optimization of both accuracy and fairness.
Practice guidelines for individual clinicians need to be developed based on prospective research studies, with caution against making ad hoc adjustments to screening and risk thresholds.

References

[1] Coley RY, et al. JAMA Psychiatry. 2021;78(7):726–34.
[2] Kearns M, Roth A. Oxford University Press; 2019.
[3] Wang X, et al. Manag Syst Eng. 2022;1(1):7.
[5] Kleinberg J, et al. ArXiv160905807 Cs Stat. 2016.
[8] Jung C, et al. arXiv. 2020.
[11] Zafar MB, et al. arXiv. 2017.
[12] Dwork C, et al. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. 2012.

Machine Learning for Biomedicine

Zero-burden EHR Analytics

Diagnostic & Screening for complex disorders

Leverage Vast Patient Database

Big Data

?

"Big"

Size

What is Data?

What is Data?

Is this Big data?

Big data?

Big data?

The purpose of Data

Nature of scientific theory

Scientific theories are almost always simple!

The Scientific Method

How do we make this work for data too "big" for our "small" minds?

What if not all phenomena admit simple explanations?

Complex Phenomena

&

Answers we seek in them*

Machine Learning

Autism Spectrum Disorder

The Secret Sauce: Inferring Probabilistic Machines from Data

The Secret Sauce: Inferring Probabilistic Machines from Data

The Secret Sauce: Inferring Probabilistic Machines from Data

The Secret Sauce: Inferring Probabilistic Machines from Data

Confusion Matrix with 2 classes

Common Performance Metrics

Relationships between Performance Metrics

Relationships between Performance Metrics

Relationships between Performance Metrics

Relationships between Performance Metrics

Relationships between Performance Metrics

Sensitivity Specificity Tradeoff

Sensitivity Specificity Tradeoff

Sensitivity Specificity Tradeoff

Sensitivity Specificity Tradeoff

References

Predicting Pandemics