Near-zero-knowledge Pattern Discovery for Universal Screening for Complex Disorders

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Chicago

ishanu@uchicago.edu

09.12.2023

Chattopadhyay Ishanu, Ph.D., faculty for this educational activity, is the founder for Zero Burden Labs, Inc., an advisor, for Adiona Health. He also receives funding from National Institute on Aging Alzheimer's Association DARPA Defense Sciences Oﬀice, Biological technologies Oﬀice. He has indicated that the presentation will not include off-label or unapproved product usage. All of the relevant financial relationships listed for this individual have been mitigated.

Disclosures

Learning Objectives

What is AI/Machine Learning? What are the key application in the context of medicine? What does it bring to the table in the context of Health Services and Bio-medicine? Are there new questions that we can answer? Does it suffice to draw on off-the-shelf models? What are the new/emerging ideas?

Application of AI in Biomedicine: Why We Need a “Bio”-AI.
Emerging tools for addressing Late and Missed Diagnosis in Primary Care
Why “risk factors” are often not predictive enough, and how to think about more personalized predictors of future risk of serious diseases

CARC presentation

Delving Deeper into Learning Goals

Early screening of complex diseases by leveraging deep pattern discovery in history of medical encounters
Use AI to transform the landscape of early disease diagnosis, prevention, and treatment strategies for complex medical conditions.
Realize universal primary care low-burden screening for disorders for which potentially no recommended screening tools exist currently
Generalize beyond known “risk factors”, uncover personalized predictors of future risk of serious diseases from subtle comorbidity signatures

Universality: the Need for "bio"-AI

Autism

Idiopathic Pulmonary Fibrosis

Alzheimer's Disease and related dementia

Suicidality, PTSD

Perioperative Cardiac Event

Aggressive Melanoma

Uterine Cancer

Pancreatic Cancer

...

complex, expensive, time-consuming diagnostic tests

Lack of Universal Screening at the point of care

Early diagnosis is difficult, late or missed diagnosis costs lives

Zero-burden EHR Analytics

Diagnostic & Screening for complex disorders

*CoR : * Comorbid Risk Scores

ACoR (Autism)

PCoR (IPF/ILD)

ZCoR (ADRD/AD)

ZCoR-C (cancers with further specialization)

Leverage Vast Patient EHR and Insurance Claims Database(s)

Truven MarketScan (IBM)
Commerical Claims & Encounters Database

2003-2018

87M patients visible > 1 year

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

Why are ML/AI models complicated, non-transparent in general?

What is Data?

shallow
mechanically gathered
systematic record of information

individual data points not so much important

Tabularia in ancient Rome – storehouses of receipts from individual purchases to monitor state of commerce. (78 B.C.)

Tyco Brahe

(1546-1601)

Johannes Keplar (1571-1630)

Newtonian theory of Universal Gravitation (1684)

30,000 experiments

Starting point of modern genetics

Mendel's Laws of Genetics

Johann Gregor Mendel (1822–1884)

Is this Big data?

Big data?

Some datasets are large, but simple: easily compressible or representable

Others, are not.

intrinsic complexity
not representable by simple rules of generation

"big data" has irreducible complexity

Hence, "models" must have capacity to accommodate this complexity

Machine Learning and AI allows us to find "theories" which are no longer specifiable as simple equations,

but require

billions of parameters to specify

aided by AI

The Scientific Method may be updated

More importantly...

Medical history

co-morbidities

lifestyle

genetics

environment

Estimate disease risk

Estimate prognosis

Reduce missed and delayed diagnosis

Find prodromal patients for clinical trials

The Age of Data

Risk

Machine Learning is poised to transform clinical discovery and outcome research

But we are not quite there..

Autism Spectrum Disorder + AI

Idiopathic Pulmonary Fibrosis + AI

Literature Search: AI + Target Disease

Current AI Applications are limited in practice

Are ML predictions pertaining to clinical diagnoses adding anything of relevance?

"predicting" autism > 3yrs
"predicting" autism with detailed videos on toddler behavior
"diagnosing" lung disease from lung imaging
"diagnosing" Alzheimer's Disease or cognitive disorder from detailed brain scan

Risk

The Key Stumbling Block: Features

How to find good features?

Good features

relevant risk factors

Must do pattern discovery

Discover factors that modulate risk, beyond what is already known

Must account for the possibility of non-causal spurious associations

Lesson

The need for Universal Screening

Often the problem is not that diseases cannot be diagnosed by physicians, but one of missed or late diagnoses in the primary care workflow
Universal screening for many diseases are non-existant
Tools that exist often yield "obvious" results

Takes too long,

not supported by insurance,

"gut feeling" / "wait & see" common

IPF diagnosed from lung imaging using CNN

Alzheimer's diagnosed from brain scan

Autism diagnosed by "AI" after 3 years

Good for writing papers, not clinically useful

1 in 59

Autism Spectrum Disorder

ASD: Ineffective screening causes delays and incurs costs

Autistic children experience higher co-morbidities

Can we exploit these patterns to predict diagnosis?

Common Knowledge: Comorbidties Exist

source: IBM Marketscan data

Autism Co-morbid Risk (ACoR) Score

Data: Onishchenko etal. Science Advances 2021

Autism Co-morbid Risk (ACoR) Score

MCHAT/F

Head to head comparison with current practice

Data: Onishchenko etal. Science Advances 2021

Autism Co-morbid Risk (ACoR) Score

Importance of different comorbidity categories

17 categories chosen:

immune | infections | endocrine | ...

Data: Onishchenko etal. Science Advances 2021

We automatically infer how different patterns depend and modulate each other to impact overall risk

Joint Operation with MCHAT

PPV=\frac{1}{1+\frac{1-c}{s}\left ( \frac{1}{p} -1 \right )}

CHOP Study allows us to see effectiveness of MCHAT in different sub-populations

Modulate sensitivity/specificity trade-offs

Data: Onishchenko etal. Science Advances 2021

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Application 2:

shortness of breath

dry cough

doctor can hear velcro crackles

Common Symptoms

>50 years old

more men than women

IPF

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

Cannot always be seen on CXR

Non-specific symptoms

PCP workflow demands

~ 4yrs

current survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML attempts to model the physician

AI in IPF Research

Co-morbidity Patterns
No data demands
Use whatever data is already on patient file

Co-morbidity Patterns
No data demands
Use whatever data is already on patient file

Primary Care

Pulmonologist

ZCoR Flag

No blood tests
No imaging
No pulmonary function tests

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

target codes appear

Past medical history

No target codes appear

case

control

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

age > 50 years
at least two IPF target codes identified at least 1 month apart
chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
no claims for alternative ILD codes occurring on or after the first IPF claim

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

Univesity of Chicago Medicam Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

performance tables

Marketscan Out-of-sample Results

specificty~99%

NPV>99.9%

IPF

ILD

performance tables

UCM Out-of-sample Results

specificty~99%

NPV>99.9%

IPF

ILD

False Positives:

Heathcare Capacity

Ethics:

Risk from Imaging Tests

For every 20-30 flags,

1 is positive

General likelihood ratio 60-80
PPV 3.5-5%

Notifying patients 4 years early?

No cure, why screen

minimal

acceptable?

Better outcomes

early anti-fibrotic therapy seems increasingly promising

better shot at lung transplant

early dx reduces hospital-izations by a factor of 1-3

Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.

Alzheimer's Disease and Related Dementia*

* in press

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*

ZCoR: ~87%

Preempting ADRD accurately upto a decade in future

Applicable To Screening for Mild Cognitive Impairment

Clinical Trial Participant Selection

Current screen-failure rate: 80-90%

Estimated rate with ZCoR:

40%

Application to Suicide Attempts and Ideation (SISA) , PTSD*:

perhaps surprising connection between mood disorders and physiological comorbidities

Gibbons RD, Kupfer D, Frank E, Moore T, Beiser DG, Boudreaux ED. Development of a Computerized Adaptive Test Suicide Scale-The CAT-SS. J Clin Psychiatry. 2017 Nov/Dec;78(9):1376-1382. doi: 10.4088/JCP.16m10922. PMID: 28493655.

* in press

Application to Malignant Neoplasms*

Melanoma

Melanoma has a high survival rate of over 90% when treated early. But if it progresses to later stages, the survival rate drops significantly. Identifying potentially life-threatening melanomas is crucial.

* in press

Take Home Message,

Conclusions,

and Next Steps

Present state of medical advancements is poised to enter a transformative era, bolstered by the emergence of sophisticated Artificial Intelligence (AI) models.
Immense potential to reshape the realm of early disease diagnosis, prevention, and treatment strategies.
Accelerate scientific discovery towards deeper understanding of complex etiologies
Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited

Reading (References)

Onishchenko, Dmytro, Yi Huang, James van Horne, Peter J. Smith, Michael E. Msall, and Ishanu Chattopadhyay. “Reduced False Positives in Autism Screening via Digital Biomarkers Inferred from Deep Comorbidity Patterns.” Science Advances 7, no. 41 (October 8, 2021). https://doi.org/10.1126/sciadv.abf0354.

Onishchenko, Dmytro, Daniel S. Rubin, James R. van Horne, R. Parker Ward, and Ishanu Chattopadhyay. “Cardiac Comorbidity Risk Score: Zero‐Burden Machine Learning to Improve Prediction of Postoperative Major Adverse Cardiac Events in Hip and Knee Arthroplasty.” Journal of the American Heart Association 11, no. 15 (August 2, 2022). https://doi.org/10.1161/jaha.121.023745.

Onishchenko, Dmytro, Robert J. Marlowe, Che G. Ngufor, Louis J. Faust, Andrew H. Limper, Gary M. Hunninghake, Fernando J. Martinez, and Ishanu Chattopadhyay. “Screening for Idiopathic Pulmonary Fibrosis Using Comorbidity Signatures in Electronic Health Records.” Nature Medicine 28, no. 10 (September 29, 2022): 2107–16. https://doi.org/10.1038/s41591-022-02010-y.

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. “Sequence Likelihood Divergence for Fast Time Series Comparison.” Knowledge and Information Systems 65, no. 7 (March 16, 2023): 3079–98. https://doi.org/10.1007/s10115-023-01855-0.

Brenner, Lisa A., Lisa M. Betthauser, Molly Penzenik, Anne Germain, Jin Jun Li, Ishanu Chattopadhyay, Ellen Frank, David J. Kupfer, and Robert D. Gibbons. "Development and validation of computerized adaptive assessment tools for the measurement of posttraumatic stress disorder among US military veterans." JAMA Network Open 4, no. 7 (2021): e2115707-e2115707.

Grand Round 1

By Ishanu Chattopadhyay

Grand Round 1

Predictive modeling of crime and rare phenomena using fractal nets

Ishanu Chattopadhyay PRO

ML Data Science Biomedicine Social Science Faculty

What is Data?

Is this Big data?

Big data?

Grand Round 1

More from Ishanu Chattopadhyay