Looking Beyond Risk Factors:

Generative Bio-AI for Proactive Point-of-Care Early Diagnosis and Reduced Screen Failures in ILD and PF

Ishanu Chattopadhyay, PhD

Assistant Professor of Medicine

University of Chicago

ishanu@uchicago.edu

ishanu@uchicago.edu

ishanu@paraknowledge.ai

University of Chicago Medicine

The Laboratory for Zero Knowledge Discovery

mathematics

computer science

social science

medicine

D3M (I2O)

PAI (DSO)

PREEMPT (BTO)

YFA (DSO)

FUNDING

Prognosis at Point-of-Diagnosis 

  • Optimizing Management

Patient Journey 

  • Continuous Risk Monitoring

Early Diagnosis

  • Universal Screening
  • Cohort Selection

Reduce screen failure rates

Holistic health surveillance

Predict antifibrotics continuation

improve outcomes

1

2

3

Interstitial Lung Disease / Pulmonary Fibrosis

Rapid Universal Point-of-care Screening for ILD/IPF Using Comorbidity Signatures in Electronic Health Records

Flag patients before they (or doctors) suspect 

Primary Care

Pulmonologist

Zero-burden Co-morbid Risk Score (ZCoR)

Referral

shortness of breath

dry cough

doctor can hear velcro crackles

Non-specific Symptoms

>50 years old

more men than women

IPF

Rare disease

~5 in 10,000

Post-Dx

Survival

~4 years

Cannot always be seen on CXR

At least one misdiagnosis

~55%

Two or more misdiagnosis

38%

Initially attributed to age related symptoms:

72%

PCP workflow demands

Known Co-morbidities of PF

Are there more? Subtle footprints in the medical history that are more hetergenous? 

~ 4yrs

current  survival ~4yrs

~ 4yrs

current clinical DX

ZCoR screening

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

n=~3M

AUC~90%

Likelihood ratio ~30

Conventional AI/ML  attempts to model the physician

AI in IPF Research

  • Co-morbidity Patterns
  • No data demands
  • Use whatever data is already on patient file

ICD administrative codes

IPF

ILD

target codes appear

Past medical history

No target codes appear

case

control

2yrs

2yrs

prediction

target codes appear

Past medical history

No target codes appear

case

control

2yrs

2yrs

IPF drugs prescribed

Signature of IPF diagnostic sequence

pirfenidone or nintedanib

  • age > 50 years
  • at least two IPF target codes identified at least 1 month apart 
  • chest CT procedure (ICD-9-CM 87.41 and Current Procedural Terminology, 4th Edition, codes 71250, 71260 and 71270) before the first diagnostic claim for IPF
  • no claims for alternative ILD codes occurring on or after the first IPF claim

ICD Codes can be noisy

"cases" are not always true IPF

Truven MarketScan (IBM)
Commerical Claims & Encounters Database
2003-2018

>100M patients visible 

>7B individual claims

>87K unique diagnostic codes

>7% Medicare data present

2,053,277 patients included in study

University of Chicago Medical Center 
2012-2021

68,658 patients

Random sample from Optumlabs Data Warehouse courtsey Mayo Clinic

861,280 patients 

2,983,215 patients

Data: Onishchenko etal. Nat. Medicine 2022

very likelihood ratios achieved irrespective of subgroup

performance tables

Out-of-sample Results

specificity ~99%

NPV >99.9%

IPF

ILD

Comorbidity Spectra

patient A

patient B

patient C

Beyond "risk factors" to personalized risk patterns

False Positives: 

  • Heathcare Capacity

Ethics:

  • Risk from Imaging Tests

For every 20-30 flags,

1 is positive

  • General likelihood ratio 60-80
  • PPV 3.5-5%
  • Notifying patients 4 years early?
  • No cure, why screen

minimal

acceptable?

Better outcomes

  • early anti-fibrotic therapy seems increasingly promising
  • better shot at lung transplant
  • early dx reduces  hospital-izations by a factor of 1-3

Collard, Harold R., Alex J. Ward, Stephan Lanes, D. Cortney Hayflinger, Daniel M. Rosenberg, and Elke Hunsche. "Burden of illness in idiopathic pulmonary fibrosis." Journal of medical economics 15, no. 5 (2012): 829-835.

Clinical Trial Cohort Selection

Current screen failure rate ~50-60%

ZCoR boosted screen failure rate ~20%

Cloud Deployment

Theoretical formulation

Multi-cohort validation

Launch User-Accessible Platform

3 years

2 years

[
    {
        "patient_id": "P000038",
        "sex": "F",
        "birth_date": "01-01-2006",
        "DX_record": [
            {"date": "07-31-2006", "code": "Z38.00"},
            {"date": "08-07-2006", "code": "P59.9"},
            {"date": "08-29-2016", "code": "J01.90"},
            {"date": "09-10-2016", "code": "J01.90"},
            {"date": "11-14-2016", "code": "J01.91"}
        ],
        "RX_record": [
            {"date": "10-29-2011", "code": "rxLDA017"},
            {"date": "05-16-2015", "code": "rxIDG004"},
            {"date": "08-08-2015", "code": "rxIDG004"},
            {"date": "06-04-2016", "code": "rxIDD013"}
        ],
        "PROC_record": [
            {"date": "02-05-2007", "code": "90723"},
            {"date": "11-05-2007", "code": "J1100"}
        ]
    }
]
{
  "predictions": [
    {
      "error_code": "",
      "patient_id": "P000012",
      "predicted_risk": 0.005794344620009157,
      "probability": 0.8253881317184486
    }
  ],
  "target": "TARGET"
}

Data In

Data Out

The Paraknowledge API

curl -X POST -H "Content-Type: application/json" -d '[{"patient_id": "P28109965201", "sex": "M", "age": 89, "fips": "35644", "DX_record": [{"date": "12-16-2011", "code": "R09.02"}, {"date": "12-30-2011", "code": "H04.129"}, {"date": "12-30-2011", "code": "H02.109"}], "RX_record": [], "PROC_record": [{"date": "09-28-2012", "code": "71100"}]}]' "https://us-central1-pkcsaas-01.cloudfunctions.net/zcor_predict?target=IPF&api_key=7eea9f70d79c408f2b69847d911303c"

Current Targets

IPF
ILD
ADRD
CKD
CKD_SEVERE
MELANOMA
CANCER_PANCREAS
CANCER_UTERUS
SISA

Cohort Selection and Risk Analysis Testbed

Upto 4 year "signal" resolution

decreases risk

increases risk

Patient Journey: Tracking Risk over time

Risk decreases sometimes

new codes change trajectory as they are revealed

Off-the-shelf AI does not suffice

Modeling Longitudinal  Patterns

Specialized HMM models from code sequences

Model control and case cohorts seprately

given a new test case, compute likelihood of sample arising from case models vs control models

sequence likelihood defect

Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098.

ZeD Lab: Predictive Screening from Comorbidity Footprints

Nature Medicine

JAHA

CELL Reports

Science Adv.

The ZCoR Approch: Rapidly Re-targettable

ZED performance Competition
Autism >80% AUC at 2 yrs "obvious"
Alzheimer's Disease ~90% AUC  60-70% AUC
Idiopathic Pulmonary Fibrosis ~90% AUC NA
MACE ~80% AUC ~70% AUC 
Bipolar Disorder ~85% AUC NA
CKD ~85% AUC NA
Cancers (Prostate, Bladder, Uterus, Skin) ~75-80% AUC Low

Deploy all/many/most of these!

Predictions at the Point-of-Diagnosis

Can my patient continue taking anti-fibrotics over long term?

Digital Twins for Health trajectories

}

\rho_1
\rho_2
\rho_i
\rho_m

1M parameters

1M parameters

Predicts disorders across the disease specturm

Pre-empting Effectiveness of Antifibrotics at the point of diagnosis

~78% AUC

26-32 out of 100 discontinued 

4-5 out of 100 discontinued

Prognosis at Point-of-Diagnosis 

  • Optimizing Management

Patient Journey 

  • Continuous Risk Monitoring

Early Diagnosis

  • Universal Screening
  • Cohort Selection

Reduce screen failure rates

Holistic health surveillance

Predict antifibrotics continuation

improve outcomes

Summary

3

2

1

ishanu@uchicago.edu

@ishanu_ch

ishanu@paraknowledge.ai

Take Home Message,

Conclusions

  • Present state of medical advancements is poised to enter a transformative era, bolstered by the emergence of sophisticated Artificial Intelligence (AI) models.
  • Immense potential to reshape the realm of early disease diagnosis, prevention, and treatment strategies.
  • Accelerate scientific discovery towards deeper understanding of complex etiologies
  • Enable more holistic approaches to medicine, where predictive patterns can be rapidly recognized and exploited
\Phi_i:\prod_{j \neq i} \Sigma_j \rightarrow \mathcal{D}(\Sigma_i)

Q-Net

recursive forest

This is a general method!

Data

\(\downarrow \)

Set of interdependent

predictors

q-distance

a biologically informed, adaptive distance between strains

\theta(x,y) \triangleq \\ \mathbf{E}_i \left ( \mathbb{J}^{\frac{1}{2}} \left (\Phi_i^P(x_{-i}) , \Phi_i^Q(y_{-i})\right ) \right )

This distance is "special"

Smaller distances imply a quantitatively high probability of spontaneous jump

$$J \textrm{ is the Jensen-Shannon divergence }$$

Metric Structure

Tangent Bundle

geometry

dynamics

Is AI/ML  adding anything of  relevance?

"predicting" autism > 3yrs

"diagnosing" fibrosis from lung imaging

"diagnosing" dementia from  brain scan

State of Art for Universal Screening

  • Autism
  • Idiopathic Pulmonary Fibrosis
  • Alzheimer's Disease and related dementia
  • Suicidality, PTSD
  • Perioperative Cardiac Event
  • Aggressive Melanoma
  • Uterine Cancer
  • Pancreatic Cancer
  • ...      
  •  
  •            
  • non-existent biomarkers 

 

  • expensive, time-consuming diagnostic tests
  • Lack of Universal Screening at the point of care
  • Early diagnosis is difficult, late or missed diagnosis costs lives
\rho_1
\rho_2
\rho_i
\rho_m

M54_72

M54_60

E78_72

1

5

X

Amgen_COPY

By Ishanu Chattopadhyay

Amgen_COPY

Predictive modeling of crime and rare phenomena using fractal nets

  • 124