AI and Machine Learning in Medicine:

Screening for Complex Diseases

Rapidly, Cheaply, and Early

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Chicago

ishanu@uchicago.edu

02.28.2024

Learning Objectives

What is AI/Machine Learning? What are the key application in the context of medicine?  What does it bring to the table in the context of Health Services and Bio-medicine? Are there new questions that we can answer?  Does it suffice to draw on off-the-shelf models? What are the new/emerging ideas?

  • Application of AI in Biomedicine: Why We Need a “Bio”-AI.

  • Emerging tools for addressing Late and Missed Diagnosis in Primary Care

  • Why “risk factors” are often not predictive enough, and how to think about more personalized predictors of future risk of serious diseases

Universal Screening?

  • Autism
  • Idiopathic Pulmonary Fibrosis
  • Alzheimer's Disease and related dementia
  • Suicidality, PTSD
  • Perioperative Cardiac Event
  • Aggressive Melanoma
  • Uterine Cancer
  • Pancreatic Cancer
  • ...      
  •  
  •            
  • expensive, time-consuming diagnostic tests
  • Lack of Universal Screening at the point of care
  • Early diagnosis is difficult, late or missed diagnosis costs lives

Zero-burden EHR Analytics

Diagnostic & Screening for complex disorders

*CoR : * Comorbid Risk Scores

ACoR (Autism)

PCoR (IPF/ILD)

ZCoR (ADRD/AD)

ZCoR-C (cancers with further specialization)

Leverage Vast Patient EHR and Insurance Claims Database(s)

Truven MarketScan (IBM)
Commerical Claims & Encounters Database

2003-2018

87M patients visible > 1 year

>7B individual claims

>87K unique diagnostic codes

 

>7% Medicare data present

Why are ML/AI models complicated, and non-transparent?

What is Data?

  • shallow
  • mechanically gathered
  • systematic record of information

individual data points not so much important

Tyco Brahe

(1546-1601)

Johannes Keplar (1571-1630)

Newtonian theory of Universal Gravitation (1684)

raw data

empirical fit

universal law of physics

30,000 experiments

Starting point of modern genetics

Mendel's Laws of Genetics

Johann Gregor Mendel (1822–1884)

Is this Big data?

Big data?

Some datasets are large, but simple: easily compressible or representable

 

Others, are not.

  • intrinsic complexity
  • not representable by simple rules of generation

"big data" has irreducible complexity

 

Hence, "models" must have capacity to accommodate this complexity

Machine Learning and AI allows us to find "theories" which are no longer specifiable as simple equations,

 

but require

billions of parameters to specify

Medical history

co-morbidities

lifestyle

genetics

environment

 

Estimate disease risk

Estimate prognosis

Reduce missed and delayed diagnosis

Find prodromal patients for clinical trials

The Age of Data

Autism Spectrum Disorder + AI

Idiopathic Pulmonary Fibrosis + AI

Literature Search: AI + Target Disease

Current AI Applications are limited in practice

Are ML predictions pertaining to clinical diagnoses adding anything of  relevance?

  • "predicting" autism > 3yrs
  • "predicting" autism with detailed videos on toddler behavior
  • "diagnosing" lung disease from lung imaging
  • "diagnosing" Alzheimer's Disease or cognitive disorder from detailed brain scan

Risk

The Key Stumbling Block: Features

How to find good features?

Good features

relevant risk factors

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Flag patients before they (or doctors) suspect 

Primary Care

Pulmonologist

?

Zero-burden Co-morbid Risk Score (ZCoR)

shortness of breath

dry cough

doctor can hear velcro crackles

Common Symptoms

>50 years old

more men than women

IPF

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

Cannot always be seen on CXR

Non-specific symptoms

PCP workflow demands

~ 4yrs

current  survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML  attempts to model the physician

AI in IPF Research

  • Co-morbidity Patterns
  • No data demands
  • Use whatever data is already on patient file

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

2yrs

prediction

target codes appear

Past medical history

No target codes appear

case

control

2yrs

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

  • age > 50 years
  • at least two IPF target codes identified at least 1 month apart 
  • chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
  • no claims for alternative ILD codes occurring on or after the first IPF claim

ICD Codes can be noisy

"cases" are not always true IPF

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible 

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

Univesity of Chicago Medicam Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients 

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

performance tables

Marketscan Out-of-sample Results

specificty~99%

NPV>99.9%

IPF

ILD

performance tables

UCM Out-of-sample Results

specificty~99%

NPV>99.9%

IPF

ILD

Comorbidity Spectra

patient A

patient B

patient C

lesson 1

Beyond "risk factors" to personalized risk patterns

False Positives: 

  • Heathcare Capacity

Ethics:

  • Risk from Imaging Tests

For every 20-30 flags,

1 is positive

  • General likelihood ratio 60-80
  • PPV 3.5-5%
  • Notifying patients 4 years early?
  • No cure, why screen

minimal

acceptable?

Better outcomes

  • early anti-fibrotic therapy seems increasingly promising
  • better shot at lung transplant
  • early dx reduces  hospital-izations by a factor of 1-3

Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.

Clinical Trial Cohort Selection

Current screen failure rate ~50-60%

ZCoR boosted screen failure rate ~20%

Longitudinal history is important

lesson 2

Off-the-shelf AI does not suffice

lesson 3

Leveraging Longitudinal  Patterns

Specialized HMM models from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

ZeD Lab: Predictive Screening from Comorbidity Footprints

Nature Medicine

JAHA

CELL Reports

Science Adv.

1 in 59

Autism Spectrum Disorder

ASD: Ineffective screening causes delays and incurs costs

Autism Co-morbid Risk (ACoR) Score

Data: Onishchenko etal. Science Advances 2021

Autism Co-morbid Risk (ACoR) Score

MCHAT/F

Head to head comparison with current practice

Data: Onishchenko etal. Science Advances 2021

Joint Operation with MCHAT

PPV=\frac{1}{1+\frac{1-c}{s}\left ( \frac{1}{p} -1 \right )}

CHOP Study allows us to see effectiveness of MCHAT in different sub-populations

Modulate sensitivity/specificity trade-offs

Data: Onishchenko etal. Science Advances 2021

The ZCoR Approch: Rapidly Re-targettable

ZED performance Competition
Autism >80% AUC at 2 yrs "obvious"
Alzheimer's Disease ~90% AUC  60-70% AUC
Idiopathic Pulmonary Fibrosis ~90% AUC NA
MACE ~80% AUC ~70% AUC 
Bipolar Disorder ~85% AUC NA
CKD ~85% AUC NA
Cancers (Prostate, Bladder, Uterus, Skin) ~75-80% AUC Low

Deploy all/many/most of these!

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

 

ZCoR:  ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

 

ZCoR:  ~87%

Preempting ADRD accurately upto a decade in future

Application to Suicide Attempts and Ideation (SISA)  , PTSD*

perhaps surprising connection between mood disorders and physiological comorbidities

Gibbons RD, Kupfer D, Frank E, Moore T, Beiser DG, Boudreaux ED. Development of a Computerized Adaptive Test Suicide Scale-The CAT-SS. J Clin Psychiatry. 2017 Nov/Dec;78(9):1376-1382. doi: 10.4088/JCP.16m10922. PMID: 28493655.

* in press

Application to Malignant Neoplasms*

Melanoma

Melanoma has a high survival rate of over 90% when treated early. But if it progresses to later stages, the survival rate drops significantly. Identifying potentially life-threatening melanomas is crucial.

* in press

Cloud Deployment

[
    {
        "patient_id": "P000038",
        "sex": "F",
        "birth_date": "01-01-2006",
        "DX_record": [
            {"date": "07-31-2006", "code": "Z38.00"},
            {"date": "08-07-2006", "code": "P59.9"},
            {"date": "08-29-2016", "code": "J01.90"},
            {"date": "09-10-2016", "code": "J01.90"},
            {"date": "11-14-2016", "code": "J01.91"}
        ],
        "RX_record": [
            {"date": "10-29-2011", "code": "rxLDA017"},
            {"date": "05-16-2015", "code": "rxIDG004"},
            {"date": "08-08-2015", "code": "rxIDG004"},
            {"date": "06-04-2016", "code": "rxIDD013"}
        ],
        "PROC_record": [
            {"date": "02-05-2007", "code": "90723"},
            {"date": "11-05-2007", "code": "J1100"}
        ]
    }
]
{
  "predictions": [
    {
      "error_code": "",
      "patient_id": "P000012",
      "predicted_risk": 0.005794344620009157,
      "probability": 0.8253881317184486
    }
  ],
  "target": "TARGET"
}

Data In

Data Out

The Paraknowledge API

curl -X POST -H "Content-Type: application/json" -d '[{"patient_id": "P28109965201", "sex": "M", "age": 89, "fips": "35644", "DX_record": [{"date": "12-16-2011", "code": "R09.02"}, {"date": "12-30-2011", "code": "H04.129"}, {"date": "12-30-2011", "code": "H02.109"}], "RX_record": [], "PROC_record": [{"date": "09-28-2012", "code": "71100"}]}]' "https://us-central1-pkcsaas-01.cloudfunctions.net/zcor_predict?target=IPF&api_key=7eea9f70d79c408f2b69847d911303c"

Current Targets

IPF
ILD
ADRD
CKD
CKD_SEVERE
MELANOMA
CANCER_PANCREAS
CANCER_UTERUS
SISA

Cohort Selection and Risk Analysis Testbed

Cohort Selection and Risk Analysis Testbed

Baseline prevalence of IPF in ILD patients

 

~25%

 

ZCoR PPV: 60% @ 50% sensitivity

1310 positive patients from 2183 flags 

screen failure: 

~70% \(\rightarrow\) 40%

Selection comparison against baseline of 2+ ILD risk factors

baseline prevalence: ~2%

projected screen failure: 

~98% baseline \(\rightarrow\) 45%

Patient Journeys for IPF: Tracking increasing Risk Over Time

Upto 4 year "signal" resolution

patient journey

Other Examples

decreases risk

increases risk

Risk decreases sometimes

new codes change trajectory as they are revealed

Take Home Message,

Conclusions

  • Present state of medical advancements is poised to enter a transformative era, bolstered by the emergence of sophisticated Artificial Intelligence (AI) models.
  • Immense potential to reshape the realm of early disease diagnosis, prevention, and treatment strategies.
  • Accelerate scientific discovery towards deeper understanding of complex etiologies
  • Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited

Reading (References)

Onishchenko, Dmytro, Yi Huang, James van Horne, Peter J. Smith, Michael E. Msall, and Ishanu Chattopadhyay. “Reduced False Positives in Autism Screening via Digital Biomarkers Inferred from Deep Comorbidity Patterns.” Science Advances 7, no. 41 (October 8, 2021). https://doi.org/10.1126/sciadv.abf0354.

 

Onishchenko, Dmytro, Daniel S. Rubin, James R. van Horne, R. Parker Ward, and Ishanu Chattopadhyay. “Cardiac Comorbidity Risk Score: Zero‐Burden Machine Learning to Improve Prediction of Postoperative Major Adverse Cardiac Events in Hip and Knee Arthroplasty.” Journal of the American Heart Association 11, no. 15 (August 2, 2022). https://doi.org/10.1161/jaha.121.023745.

 

Onishchenko, Dmytro, Robert J. Marlowe, Che G. Ngufor, Louis J. Faust, Andrew H. Limper, Gary M. Hunninghake, Fernando J. Martinez, and Ishanu Chattopadhyay. “Screening for Idiopathic Pulmonary Fibrosis Using Comorbidity Signatures in Electronic Health Records.” Nature Medicine 28, no. 10 (September 29, 2022): 2107–16. https://doi.org/10.1038/s41591-022-02010-y.

 

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. “Sequence Likelihood Divergence for Fast Time Series Comparison.” Knowledge and Information Systems 65, no. 7 (March 16, 2023): 3079–98. https://doi.org/10.1007/s10115-023-01855-0.

 

Brenner, Lisa A., Lisa M. Betthauser, Molly Penzenik, Anne Germain, Jin Jun Li, Ishanu Chattopadhyay, Ellen Frank, David J. Kupfer, and Robert D. Gibbons. "Development and validation of computerized adaptive assessment tools for the measurement of posttraumatic stress disorder among US military veterans." JAMA Network Open 4, no. 7 (2021): e2115707-e2115707.

QUESTIONS

ishanu@uchicago.edu

@ishanu_ch

Delving Deeper into Learning Goals

  • Early screening of complex diseases by leveraging deep pattern discovery in history of medical encounters

  • Use AI to transform  the landscape of early disease diagnosis, prevention, and treatment strategies for complex medical conditions.

  • Realize universal primary care low-burden screening for disorders for which potentially no recommended screening tools exist currently

  • Generalize beyond known “risk factors”, uncover personalized predictors of future risk of serious diseases from subtle comorbidity signatures

Grand Round Mt Sinai

By Ishanu Chattopadhyay

Grand Round Mt Sinai

Predictive modeling of crime and rare phenomena using fractal nets

  • 147