Ishanu Chattopadhyay
University of Chicago
CCTS 40500 / CCTS 20500 / BIOS 29208
Winter 2023
Contact
ishanu@uchicago.edu
900 E57 ST
KCBD 10152
Room: BSLC 313
Monday 9.30 - 12.20 AM
Resources
https://github.com/zeroknowledgediscovery/course_notes
RCC Midway
Class Time
MON 9.30 AM
(~3 hrs)
FRI 9.00 AM
(0.5-1 hr if you have questions)
What is Machine Learning?
Why is it everywhere?
Why is it important to biomedicine?
Are we really solving the hard problems?
Learning from machines?
Learning with the help of computers?
Modeling data?
Regression?
Learning from machines?
Learning with the help of computers?
Modeling data?
Regression?
Statistics
AI
Data Mining
Deep Learning
"Machine learning is essentially a form of applied statistics”
“Machine learning is statistics scaled up to big data”
“Machine learning is Statistics minus any checking of models and assumptions.”
“I don’t know what Machine Learning will look like in ten years, but whatever it is I’m sure Statisticians will be whining that they did it earlier and better.”
Approach to a problem differs between mathematicians, statisticians & ML-experts
Two features
different models produce different solutions
Data Science
Big Data Analytics
Is there any good reason to assume that data that you have not seen yet will share any properties with data you have already seen?
Data
Knowledge
Towards a grand unified theory of data
lots of data!
Classical Science
The age of data
Pandemics
Emergent Pathogens
Social Dynamics
Complex Diseases
Data
Forecast case count
Predict future mutations
Predict crime
Diagnose Complex Diseases
Data
Data
Insight
scientific knowledge
Clinical Decisions
social theory
Designing Better Vaccines
Designing Better Vaccines
Can we predict future variants?
Designing Better Vaccines
Can we predict future variants?
Designing Better Vaccines
Can we predict future variants?
Designing Better Vaccines
Can we predict future variants?
Designing Better Vaccines
Can we predict future variants?
Bio-NORAD
Are we prepared for the next pandemic?
Future:
NORAD
for biological threats
Microbiome
Modeling complex ecosystems
e.g.
human gut
microbiome
Truven MarketScan (IBM) Commerical Claims & Encounters Database 2003-2018
87M patients visible > 1 year
>7B individual claims
>87K unique diagnostic codes
>7% Medicare data present
*CoR : * Comorbid Risk Scores
ACoR
PCoR
ZCoR
Universality
Autism
Bipolar Disorder
Idiopathic Pulmonary Fibrosis
Alzheimer's Disease
Perioperative Cardiac Event
Chronic Kidney Disease
...
Conventional Off-the-shelf ML will not do!
ASD: Ineffective screening causes delays and incurs costs
Current Prevalence: 1 in 59
Children with ASD experience higher co-morbidities
Can we exploit these patterns to predict diagnosis?
Common Knowledge: Comorbidties Exist
Autism Co-morbid Risk (ACoR) Score
Autism Co-morbid Risk (ACoR) Score
MCHAT/F
Head to head comparison with current practice
ACoR: Variation with Age
can track risk increase over time
Older children are easier to diagnose
Co-morbidity Spectra: Pattern Discovery amidst Heterogeneity
Top patters come from:
Nervous disorders
Digestive disorders
Injury & Poisoning
Neoplasms
Endocrine
Immune
Deep Learning Without Neural Networks: Fractal-nets for Rare Event Modeling (Under Review Nature Machine Intelligence)
Yi Huang, James Evans, I. Chattopadhyay
Sequence Likelihood Divergence For Fast Time Series Comparison
Yi Huang, Victor Rotaru, I. Chattopadhyay
Under Review IEEE Transactions of Data and Knowledge Engineering
Abductive learning of quantized stochastic processes with probabilistic finite automata
Ishanu Chattopadhyay and Hod Lipson
2013 Phil. Trans. R. Soc. A.3712011054320110543
Immune female control
Immune female case
Secret Sauce: Leverging Temporal Patterns
Specialized HMM models from code sequences
Model control and case cohorts seprately
given a new test case, compute likelihood of sample arising from case models vs control models
sequence likelihood defect
Bipolar Disorder
Manic Episodes in Mood Disorders
No Blood-work
No questionnaire
Dx codes + Rx Codes
Idiopathic
Pulmonary
Fibrosis
Idiopathic Pulmonary Fibrosis
No effective screening available
Pathobiology unclear
Post diagnostic survival: 3-5 years
Significant Boost in survival time
Alzheimer's Disease and Related Dementia
>5 Million in US. >13 Million in next 10 years
Alzheimer's Disease and Related Dimentia
state of art with EHR:
~67% AUC*
ZCoR: ~87%
Alzheimer's Disease and Related Dimentia
state of art with EHR:
~67% AUC*
ZCoR: ~87%
Preempting ADRD accurately upto a decade in future
Perioperative Cardiac Risk from Hip/Knee Surgeries
Impact on patient outcome
Prospective Validation
ASD
ADRD
Pediatrics
Neurology
(Memory Center)
Using Flu incidence data from the past to predict COVID-19 case counts
Predicting rare and extreme events in complex dynamical systems
rare weather events
earthquakes
crime
Fractal Net Architecture: Rethinking Deep Learning in Stochastic Rare/Extreme Event Scenario
Predicting crime and auditing enforcement biases
ECG
EEG
Microbiome
EHR
Genomic
Epidemiology
Tissue Image
Sequence
NN Computation
short-hand
NN Learning: Backpropagation
optimizing weights and biases
by minimizing a loss-function
API has been trained on the COCO dataset (Common Objects in Context).
?
End of First Class
HW: