Near-zero-knowledge Pattern Discovery for Universal Screening for Complex Disorders

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Chicago


Chattopadhyay Ishanu, Ph.D., faculty for this educational activity, is the founder for Zero Burden Labs, Inc., an advisor, for Adiona Health.  He also receives funding from National Institute on Aging Alzheimer's Association DARPA Defense Sciences Office, Biological technologies Office. He has indicated that the presentation will not include off-label or unapproved product usage. All of the relevant financial relationships listed for this individual have been mitigated.



Learning Objectives

What is AI/Machine Learning? What are the key application in the context of medicine?  What does it bring to the table in the context of Health Services and Bio-medicine? Are there new questions that we can answer?  Does it suffice to draw on off-the-shelf models? What are the new/emerging ideas?

  • Application of AI in Biomedicine: Why We Need a “Bio”-AI.

  • Emerging tools for addressing Late and Missed Diagnosis in Primary Care

  • Why “risk factors” are often not predictive enough, and how to think about more personalized predictors of future risk of serious diseases

CARC presentation 

Delving Deeper into Learning Goals

  • Early screening of complex diseases by leveraging deep pattern discovery in history of medical encounters

  • Use AI to transform  the landscape of early disease diagnosis, prevention, and treatment strategies for complex medical conditions.

  • Realize universal primary care low-burden screening for disorders for which potentially no recommended screening tools exist currently

  • Generalize beyond known “risk factors”, uncover personalized predictors of future risk of serious diseases from subtle comorbidity signatures

Universality: the Need for "bio"-AI


Idiopathic Pulmonary Fibrosis

Alzheimer's Disease and related dementia

Suicidality, PTSD

Perioperative Cardiac Event

Aggressive Melanoma

Uterine Cancer

Pancreatic Cancer


  • complex, expensive, time-consuming diagnostic tests
  • Lack of Universal Screening at the point of care
  • Early diagnosis is difficult, late or missed diagnosis costs lives

Zero-burden EHR Analytics

Diagnostic & Screening for complex disorders

*CoR : * Comorbid Risk Scores

ACoR (Autism)



ZCoR-C (cancers with further specialization)

Leverage Vast Patient EHR and Insurance Claims Database(s)

Truven MarketScan (IBM)
Commerical Claims & Encounters Database


87M patients visible > 1 year

>7B individual claims

>87K unique diagnostic codes


>7% Medicare data present

Why are ML/AI models complicated, non-transparent in general?

What is Data?

  • shallow
  • mechanically gathered
  • systematic record of information

individual data points not so much important

Tabularia in ancient Rome – storehouses of receipts from individual purchases to monitor state of commerce. (78 B.C.)

Tyco Brahe


Johannes Keplar (1571-1630)

Newtonian theory of Universal Gravitation (1684)

30,000 experiments

Starting point of modern genetics

Mendel's Laws of Genetics

Johann Gregor Mendel (1822–1884)

Is this Big data?

Big data?

Some datasets are large, but simple: easily compressible or representable


Others, are not.

  • intrinsic complexity
  • not representable by simple rules of generation

"big data" has irreducible complexity


Hence, "models" must have capacity to accommodate this complexity

Machine Learning and AI allows us to find "theories" which are no longer specifiable as simple equations,


but require

billions of parameters to specify

aided by AI

The Scientific Method may be updated

More importantly...

Medical history






Estimate disease risk

Estimate prognosis

Reduce missed and delayed diagnosis

Find prodromal patients for clinical trials

The Age of Data


Machine Learning is poised to transform clinical discovery and outcome research

But we are not quite there..

Autism Spectrum Disorder + AI

Idiopathic Pulmonary Fibrosis + AI

Literature Search: AI + Target Disease

Current AI Applications are limited in practice

Are ML predictions pertaining to clinical diagnoses adding anything of  relevance?

  • "predicting" autism > 3yrs
  • "predicting" autism with detailed videos on toddler behavior
  • "diagnosing" lung disease from lung imaging
  • "diagnosing" Alzheimer's Disease or cognitive disorder from detailed brain scan


The Key Stumbling Block: Features

How to find good features?

Good features

relevant risk factors

Must do pattern discovery


Discover factors that modulate risk, beyond what is already known


Must account for the possibility of non-causal spurious associations


The need for Universal Screening

  • Often the problem is not that diseases cannot be diagnosed by physicians, but one of missed or late diagnoses in the primary care workflow
  • Universal screening for many diseases are non-existant
  • Tools that exist often yield "obvious" results

Takes too long,

not supported by insurance,

"gut feeling" / "wait & see" common

IPF diagnosed from lung imaging using CNN

Alzheimer's diagnosed from brain scan

Autism diagnosed by "AI" after 3 years

Good for writing papers, not clinically useful

1 in 59

Autism Spectrum Disorder

ASD: Ineffective screening causes delays and incurs costs

Autistic children experience higher co-morbidities

Can we exploit these patterns to predict diagnosis?

Common Knowledge: Comorbidties  Exist

source: IBM Marketscan data

Autism Co-morbid Risk (ACoR) Score

Data: Onishchenko etal. Science Advances 2021

Autism Co-morbid Risk (ACoR) Score


Head to head comparison with current practice

Data: Onishchenko etal. Science Advances 2021

Autism Co-morbid Risk (ACoR) Score

Importance of different comorbidity categories

17 categories chosen:

immune | infections | endocrine | ...

Data: Onishchenko etal. Science Advances 2021

We automatically infer how different patterns depend and modulate each other to impact overall risk

Joint Operation with MCHAT

PPV=\frac{1}{1+\frac{1-c}{s}\left ( \frac{1}{p} -1 \right )}

CHOP Study allows us to see effectiveness of MCHAT in different sub-populations

Modulate sensitivity/specificity trade-offs

Data: Onishchenko etal. Science Advances 2021

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Application 2:

shortness of breath

dry cough

doctor can hear velcro crackles

Common Symptoms

>50 years old

more men than women


Rare disease

~5 in 10,000



~4 years

At least one misdiagnosis


Two or more misdiagnosis


Initially attributed to age related symptoms:


Cannot always be seen on CXR

Non-specific symptoms

PCP workflow demands

~ 4yrs

current  survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022).



Likelihood ratio ~30

Conventional AI/ML  attempts to model the physician

AI in IPF Research

  • Co-morbidity Patterns
  • No data demands
  • Use whatever data is already on patient file
  • Co-morbidity Patterns
  • No data demands
  • Use whatever data is already on patient file

Primary Care


ZCoR Flag

  • No blood tests
  • No imaging
  • No pulmonary function tests

ICD administrative codes



target codes appear

Past medical history

No target codes appear





target codes appear

Past medical history

No target codes appear





IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

  • age > 50 years
  • at least two IPF target codes identified at least 1 month apart 
  • chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
  • no claims for alternative ILD codes occurring on or after the first IPF claim
2,053,277 patients included in study

Univesity of Chicago Medicam Center 

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients 

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

Marketscan Out-of-sample Results





UCM Out-of-sample Results





False Positives: 

  • Heathcare Capacity


  • Risk from Imaging Tests

For every 20-30 flags,

1 is positive

  • General likelihood ratio 60-80
  • PPV 3.5-5%
  • Notifying patients 4 years early?
  • No cure, why screen



Better outcomes

  • early anti-fibrotic therapy seems increasingly promising
  • better shot at lung transplant
  • early dx reduces  hospital-izations by a factor of 1-3

Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.

Alzheimer's Disease and Related Dementia*

* in press

>5 Million in US. >13 Million in next 10 years

Alzheimer's Disease and Related Dimentia

MOCA, Blood Tests

Current Practice:

state of art with EHR:

~67% AUC*


ZCoR:  ~87%

Alzheimer's Disease and Related Dimentia

state of art with EHR:

~67% AUC*


ZCoR:  ~87%

Preempting ADRD accurately upto a decade in future

Applicable To Screening for Mild Cognitive Impairment

Clinical Trial Participant Selection

Current screen-failure rate: 80-90%


Estimated rate with ZCoR:


Application to Suicide Attempts and Ideation (SISA)  , PTSD*

perhaps surprising connection between mood disorders and physiological comorbidities

Gibbons RD, Kupfer D, Frank E, Moore T, Beiser DG, Boudreaux ED. Development of a Computerized Adaptive Test Suicide Scale-The CAT-SS. J Clin Psychiatry. 2017 Nov/Dec;78(9):1376-1382. doi: 10.4088/JCP.16m10922. PMID: 28493655.

* in press

Application to Malignant Neoplasms*


Melanoma has a high survival rate of over 90% when treated early. But if it progresses to later stages, the survival rate drops significantly. Identifying potentially life-threatening melanomas is crucial.

* in press

Take Home Message,


and Next Steps

  • Present state of medical advancements is poised to enter a transformative era, bolstered by the emergence of sophisticated Artificial Intelligence (AI) models.
  • Immense potential to reshape the realm of early disease diagnosis, prevention, and treatment strategies.
  • Accelerate scientific discovery towards deeper understanding of complex etiologies
  • Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited

Reading (References)

Onishchenko, Dmytro, Yi Huang, James van Horne, Peter J. Smith, Michael E. Msall, and Ishanu Chattopadhyay. “Reduced False Positives in Autism Screening via Digital Biomarkers Inferred from Deep Comorbidity Patterns.” Science Advances 7, no. 41 (October 8, 2021).


Onishchenko, Dmytro, Daniel S. Rubin, James R. van Horne, R. Parker Ward, and Ishanu Chattopadhyay. “Cardiac Comorbidity Risk Score: Zero‐Burden Machine Learning to Improve Prediction of Postoperative Major Adverse Cardiac Events in Hip and Knee Arthroplasty.” Journal of the American Heart Association 11, no. 15 (August 2, 2022).


Onishchenko, Dmytro, Robert J. Marlowe, Che G. Ngufor, Louis J. Faust, Andrew H. Limper, Gary M. Hunninghake, Fernando J. Martinez, and Ishanu Chattopadhyay. “Screening for Idiopathic Pulmonary Fibrosis Using Comorbidity Signatures in Electronic Health Records.” Nature Medicine 28, no. 10 (September 29, 2022): 2107–16.


Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. “Sequence Likelihood Divergence for Fast Time Series Comparison.” Knowledge and Information Systems 65, no. 7 (March 16, 2023): 3079–98.


Brenner, Lisa A., Lisa M. Betthauser, Molly Penzenik, Anne Germain, Jin Jun Li, Ishanu Chattopadhyay, Ellen Frank, David J. Kupfer, and Robert D. Gibbons. "Development and validation of computerized adaptive assessment tools for the measurement of posttraumatic stress disorder among US military veterans." JAMA Network Open 4, no. 7 (2021): e2115707-e2115707.

