Ishanu Chattopadhyay

University of Chicago

Machine Learning & Advanced Analytics for Biomedicine

CCTS 40500 / CCTS 20500 / BIOS 29208
Winter 2023

Contact

ishanu@uchicago.edu

900 E57 ST

KCBD 10152

https://uchicagomedicine.zoom.us/j/94596164273?pwd=am1GS1drOVRPdUk4SDNaaGRVTEhPUT09

Room: BSLC 313

Monday 9.30 - 12.20 AM

https://join.slack.com/t/mlbio2023/shared_invite/zt-1mshbyy5f-nmhGiYSUXdXoVDmVe_Qf4Q

Resources

https://github.com/zeroknowledgediscovery/course_notes

RCC Midway

Expectations
Grading
Midterm
Final Project

You should be able to model complex data on your own
Choose the right framework for the right parameters, for the right reasons
Know the limitations and the strengths of your model

Expectations
Grading
Midterm
Final Project

I grade on progress and effort
Innovative approaches get more credit
There will be homeworks, not weekly but periodically

Expectations
Grading
Midterm
Final Project

Midterm and finals are not in-class exams
Again, effort and innovation get more credit

Class Time

MON 9.30 AM

(~3 hrs)

FRI 9.00 AM

(0.5-1 hr if you have questions)

Today's Take-Home Message

What is Machine Learning?

Why is it everywhere?

Why is it important to biomedicine?

Are we really solving the hard problems?

What is Machine Learning

Learning from machines?

Learning with the help of computers?

Modeling data?

Regression?

What is Machine Learning

Learning from machines?

Learning with the help of computers?

Modeling data?

Regression?

data -> (intelligent) automated analysis -> actionable insights

How is Machine Learning different from...

Statistics

Data Mining

Deep Learning

How is Machine Learning different from...

"Machine learning is essentially a form of applied statistics”

“Machine learning is statistics scaled up to big data”

“Machine learning is Statistics minus any checking of models and assumptions.”

“I don’t know what Machine Learning will look like in ten years, but whatever it is I’m sure Statisticians will be whining that they did it earlier and better.”

Approach to a problem differs between mathematicians, statisticians & ML-experts

Central Limit Theorem
Measure Theory
Stochastic Processes

Linear Regression
General Linear Models
What is the "correct" statistical model for a problem/process ?
Often interest is "describing" data already observed

No model is correct.
The useful ones predict correctly more often than others
ONLY interested in how well a model works on unseen data

Decision Surfaces with Different Classification Algorithms

Two features

different models produce different solutions

Data Science

Big Data Analytics

Data Science = Automated Analytics

How Do We Teach Machines To..

Is there any good reason to assume that data that you have not seen yet will share any properties with data you have already seen?

Broad ML Categories

A Bird's Eye View

http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

http://www.r2d3.us/visual-intro-to-machine-learning-part-2/

ML Applications in Bio-medicine

Uncharted Possibilities

Predicting future disease
Optimizing interventions
Discovering unknown mechanisms

A new paradigm of scientific discovery
At-scale pattern discovery impossible otherwise

Data

Knowledge

Towards a grand unified theory of data

lots of data!

Classical Science

The age of data

Pandemics

Emergent Pathogens

Social Dynamics

Complex Diseases

Data

Forecast case count

Predict future mutations

Predict crime

Diagnose Complex Diseases

Data

Data

Insight

scientific knowledge

Clinical Decisions

social theory