Cam Davidson-Pilon
PyData Piraeus Meetup 🎉
July 31, 2020
Classification
Classification
Regression
Classification
Regression
Causal Inference
Classification
Regression
Causal Inference
Dim. Reduction
....
Classification
Regression
Causal Inference
Dim. Reduction
....
Survival Analysis
Classification
Regression
Causal Inference
Dim. Reduction
....
Survival Analysis
*Later, we will generalize this
T | E |
---|---|
60 | 1 |
60 | 1 |
60 | 0 |
15 | 0 |
69 | 1 |
45 | 1 |
17 | 1 |
48 | 1 |
60 | 1 |
... | ... |
As are:
As are:
As are:
👍
👍
👎
Non-parametric baseline hazard
Non-parametric baseline hazard
Parametric scalar
Non-parametric baseline hazard
Parametric scalar
Non-parametric baseline hazard
Parametric scalar
We discretize the [0,1] interval, put the probabilities into bins, and average them.
We discretize the [0,1] interval, put the probabilities into bins, and average them. We hope that:
This is like creating a new prediction function, F, that maps model predictions to outcomes.
We discretize the [0,1] interval, put the probabilities into bins, and average them. We hope that:
The binning approach is coarse and drops information.
The binning approach is coarse and drops information.
One could use a parametric model:
The binning approach is coarse and drops information.
One could use a parametric model:
More flexible the better (competing with a non-parametric histogram)
The binning approach is coarse and drops information.
One could use a parametric model:
More flexible the better (competing with a non-parametric histogram)
Apply to same idea to survival model calibrations.
Given a fixed time, t, we output probabilities of subjects being alive.
Given a fixed time, t, we output probabilities of subjects being alive.
We need to connect these probabilities, p, to realized data (T, E).
Given a fixed time, t, we output probabilities of subjects being alive.
We need to connect these probabilities, p, to realized data (T, E).
We can use a highly flexible parametric survival model.
Metrics like the hazard ratio (from Cox model), or log-rank, are hard to interpret.
Metrics like the hazard ratio (from Cox model), or log-rank, are hard to interpret.
Plus they tell you little about the future - how many years do I have left to live?
Median survival time is okay - but doesn't always exist.
Restricted Mean Survival Time is the new standard. (RMST)
Restricted Mean Survival Time is the new standard. (RMST)
Python:
R