Modern

Survival Analysis

Cam Davidson-Pilon

PyData Piraeus Meetup 🎉

July 31, 2020

statistics

Machine
Learning

statistics

Machine
Learning

Classification

statistics

Machine
Learning

Classification

Regression

statistics

Machine
Learning

Classification

Regression

Causal Inference

statistics

Machine
Learning

Classification

Regression

Causal Inference

Dim. Reduction
....

statistics

Machine
Learning

Classification

Regression

Causal Inference

Dim. Reduction
....

Survival Analysis

statistics

Machine
Learning

Classification

Regression

Causal Inference

Dim. Reduction
....

Survival Analysis

What is survival Analysis?

Measuring the time between events*

What is survival Analysis?

Measuring the time between events*

*Later, we will generalize this

Time between contracting a disease, and death.

Time between contracting a disease, and death.
How long a politician is in office for.

Time between contracting a disease, and death.
How long a politician is in office for.
Time between a user sign up and the user leaving.

Time between contracting a disease, and death.
How long a politician is in office for.
Time between a user sign up and the user leaving.
How long a packet takes to travel from data center to data center.

Time between contracting a disease, and death.
How long a politician is in office for.
Time between a user sign up and the user leaving.
How long a packet takes to travel from data center to data center.
The lifespan of an organism.

Survival Analysis

Regression

Survival Analysis

duration should be non-negative

Regression

Survival Analysis

duration should be non-negative

Regression

not necessarily, but you could transform the output to a positive value

Survival Analysis

duration should be non-negative
the event may or may not occur

Regression

not necessarily, but you could transform the output to a positive value

Survival Analysis

duration should be non-negative
the event may or may not occur

Regression

not necessarily, but you could transform the output to a positive value
???

Survival Analysis

duration should be non-negative
the event may or may not occur

Classification

Survival Analysis

duration should be non-negative
the event may or may not occur

Classification

naturally models if an event occurs.

Survival Analysis

duration should be non-negative
the event may or may not occur

Classification

naturally models if an event occurs.
But throws out important information about how long it took to occur (or not)

Pieces of Survival Analysis

Survival Function

S(t) := P(T > t)

Hazard function

h(t):= \lim_{\delta t \rightarrow 0 } \; \frac{Pr( t \le T \le t + \delta t | T > t)}{\delta t}

Hazard function

h(t):= \text{Probability you die this instant,} \\ \text{given you haven't died by time $t$}

Hazard function

h(t) = -\frac{S'(t)}{S(t) }

Hazard function

S(t) = \exp{\left(-\int_0^t h(s) ds\right)}

Cumulatuve Hazard function

S(t) = \exp{\left(-H(t)\right)}

Cumulatuve Hazard function

S(t) = \exp{\left(-H(t)\right)}

H(t) := \int_0^t h(s) ds

In survival analysis, you're modelling one of these

Let's model the survival function

Data -> Kaplan-Meier

T	E
60	1
60	1
60	0
15	0
69	1
45	1
17	1
48	1
60	1
...	...

Kaplan Meier is non-parametric

As are:

Kaplan Meier is non-parametric

As are:

mean
median
empirical CDF

Kaplan Meier is non-parametric

As are:

mean
median
empirical CDF
random forests / decision trees

Modern survival

Analysis Tip #1

Regression typically models the cumulative hazard

S(t) = \exp{\left(-H(t)\right)}

H(t) := \int_0^t h(s) ds

H(t\;|\;x) = f(t, \theta(x))

H(t\;|\;x) = \frac{t}{\lambda(x)}

H(t\;|\;x) = f(t, \theta(x))

H(t\;|\;x) = \left(\frac{t}{\lambda(x)}\right)^{\rho(x)}

H(t\;|\;x) = \frac{t}{\lambda(x)}

H(t\;|\;x) = f(t, \theta(x))

H(t\;|\;x) = \left(\frac{t}{\lambda(x)}\right)^{\rho(x)}

H(t\;|\;x) = \frac{t}{\lambda(x)}

H(t\;|\;x) = \text{NN}(t, W(x))

H(t\;|\;x) = f(t, \theta(x))

H(t\;|\;x) = \left(\frac{t}{\lambda(x)}\right)^{\rho(x)}

H(t\;|\;x) = \frac{t}{\lambda(x)}

H(t\;|\;x) = \text{NN}(t, W(x))

H(t\;|\;x) = \frac{1}{B}\sum_{b=1}^B H_b(t, x)

Why model the cumulative hazard?

Independent causes of death are additive on the cumulative hazard scale.

Why model the cumulative hazard?

Independent causes of death are additive on the cumulative hazard scale.

S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)

Why model the cumulative hazard?

Independent causes of death are additive on the cumulative hazard scale.

S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)

Differentiating (to recover the hazard) is many times easier than integrating the hazard (to recover the CHF).

Why model the cumulative hazard?

Independent causes of death are additive on the cumulative hazard scale.

S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)

Differentiating (to recover the hazard) is many times easier than integrating the hazard (to recover the CHF).

Why model the cumulative hazard?

Independent causes of death are additive on the cumulative hazard scale.

S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)

Differentiating (to recover the hazard) is many times easier than integrating the hazard (to recover the CHF).

👍

Why model the cumulative hazard?

Independent causes of death are additive on the cumulative hazard scale.

S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)

Differentiating (to recover the hazard) is many times easier than integrating the hazard (to recover the CHF).

👍

👎

"I've heard of the cox proportional model - what's that?"

Cox Proportional Hazard model

Used because it's semi-parametric

Cox Proportional Hazard model

Used because it's semi-parametric

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Cox Proportional Hazard model

Used because it's semi-parametric

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Non-parametric baseline hazard

Cox Proportional Hazard model

Used because it's semi-parametric

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Non-parametric baseline hazard

Parametric scalar

Cox Proportional Hazard model

Used because it's semi-parametric

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Non-parametric baseline hazard

Parametric scalar

the non-parametric part is nice: "it's makes less assumptions about the form"

Modern survival

Analysis Tip #2

Cox Proportional Hazard model

Used because it's semi-parametric

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Non-parametric baseline hazard

Parametric scalar

Don't bother with it

the non-parametric part is nice: "it's makes less assumptions about the form"

Cox Proportional Hazard model

the non-parametric part is nice: "it's makes less assumptions about the form"

Cox Proportional Hazard model

the non-parametric part is nice: "it's makes less assumptions about the form"
But it carries a lot of hidden, and quite strict, assumptions.

Cox Proportional Hazard model

the non-parametric part is nice: "it's makes less assumptions about the form"
But it carries a lot of hidden, and quite strict, assumptions.

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Cox Proportional Hazard model

the non-parametric part is nice: "it's makes less assumptions about the form"
But it carries a lot of hidden, and quite strict, assumptions.
The coefficients are non-collapsible.

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Cox Proportional Hazard model

the non-parametric part is nice: "it's makes less assumptions about the form"
But it carries a lot of hidden, and quite strict, assumptions.
The coefficients are non-collapsible.
The coefficients cannot be interpreted causally.

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Cox Proportional Hazard model

the non-parametric part is nice: "it's makes less assumptions about the form"
But it carries a lot of hidden, and quite strict, assumptions.
The coefficients are non-collapsible.
The coefficients cannot be interpreted causally.
Prediction is less efficient than other methods.

h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Calibration of Survival Models

Modern Survival Analysis tip #3

f: MoDel's Probability outputs
↦ observed outcomes

We discretize the [0,1] interval, put the probabilities into bins, and average them.

f: MoDel's Probability outputs
↦ observed outcomes

\frac{1}{n}\sum_{\text{clf}(x_i) \in B} \text{clf}(x_i) \approx \frac{1}{n} \sum_{x_i \in B} y_i

We discretize the [0,1] interval, put the probabilities into bins, and average them. We hope that:

f: MoDel's Probability outputs
↦ observed outcomes

\frac{1}{n}\sum_{\text{clf}(x_i) \in B} \text{clf}(x_i) \approx \frac{1}{n} \sum_{x_i \in B} y_i

This is like creating a new prediction function, F, that maps model predictions to outcomes.

We discretize the [0,1] interval, put the probabilities into bins, and average them. We hope that:

F(x) = \frac{1}{n}\sum_{\text{clf}(x_i) \in B(\text{clf}(x))} \text{clf}(x_i)

f: MoDel's Probability outputs
↦ observed outcomes

The binning approach is coarse and drops information.

f: MoDel's Probability outputs
↦ observed outcomes

The binning approach is coarse and drops information.

One could use a parametric model:

y_i \approx \text{Logit}(\text{clf}(x_i))

f: MoDel's Probability outputs
↦ observed outcomes

The binning approach is coarse and drops information.

One could use a parametric model:

y_i \approx \text{Logit}(\text{clf}(x_i))

More flexible the better (competing with a non-parametric histogram)

f: MoDel's Probability outputs
↦ observed outcomes

The binning approach is coarse and drops information.

One could use a parametric model:

y_i \approx \text{Logit}(\text{clf}(x_i))

More flexible the better (competing with a non-parametric histogram)

Apply to same idea to survival model calibrations.

f: MoDel's Probability outputs
↦ observed outcomes

Given a fixed time, t, we output probabilities of subjects being alive.

f: MoDel's Probability outputs
↦ observed outcomes

Given a fixed time, t, we output probabilities of subjects being alive.

We need to connect these probabilities, p, to realized data (T, E).

f: MoDel's Probability outputs
↦ observed outcomes

Given a fixed time, t, we output probabilities of subjects being alive.

We need to connect these probabilities, p, to realized data (T, E).

We can use a highly flexible parametric survival model.

Do I have time for

Modern Survival Analysis tip #4?

Summary metrics

Metrics like the hazard ratio (from Cox model), or log-rank, are hard to interpret.

Plus they tell you little about the future - how many years do I have left to live?

Summary metrics

Median survival time is okay - but doesn't always exist.

Summary metrics

Restricted Mean Survival Time is the new standard. (RMST)

Summary metrics

Restricted Mean Survival Time is the new standard. (RMST)

Summary metrics

\text{RMST}(t) = \int_0^t S(s) ds

In conclusion

Use KMunicate-style for presenting Kaplan-Meier results

In conclusion

Use KMunicate-style for presenting Kaplan-Meier results
Skip the Cox Proportional Hazard model (if you can)

In conclusion

Use KMunicate-style for presenting Kaplan-Meier results
Skip the Cox Proportional Hazard model (if you can)
Use survival probability calibration plots

In conclusion

Use KMunicate-style for presenting Kaplan-Meier results
Skip the Cox Proportional Hazard model (if you can)
Use survival probability calibration plots
RMST (restricted mean survival times) are often preferable to other summary metrics.

Questions?

Software

Python:

lifelines
scikit-survival
pycox

survival
flexsurvreg

Modern

Survival Analysis

statistics

statistics

Machine Learning

statistics

Machine Learning

statistics

Machine Learning

statistics

Machine Learning

statistics

Machine Learning

statistics

Machine Learning

statistics

Machine Learning

What is survival Analysis?

What is survival Analysis?

Measuring the time between events*

What is survival Analysis?

Measuring the time between events*

Survival Analysis

Regression

Survival Analysis

Regression

Survival Analysis

Regression

Survival Analysis

Regression

Survival Analysis

Regression

Survival Analysis

Classification

Survival Analysis

Classification

Survival Analysis

Classification

Pieces of Survival Analysis

Survival Function

Survival Function

Hazard function

Hazard function

Hazard function

Hazard function

Hazard function

Cumulatuve Hazard function

Cumulatuve Hazard function

In survival analysis, you're modelling one of these

Let's model the survival function

Data -> Kaplan-Meier

Kaplan Meier is non-parametric

Kaplan Meier is non-parametric

Kaplan Meier is non-parametric

Modern survival

Analysis Tip #1

Regression typically models the cumulative hazard

Regression typically models the cumulative hazard

Why model the cumulative hazard?

Why model the cumulative hazard?

Why model the cumulative hazard?

Why model the cumulative hazard?

Why model the cumulative hazard?

Why model the cumulative hazard?

"I've heard of the cox proportional model - what's that?"

Cox Proportional Hazard model

Cox Proportional Hazard model

Cox Proportional Hazard model

Cox Proportional Hazard model

Cox Proportional Hazard model

Modern survival

Analysis Tip #2

Cox Proportional Hazard model

Don't bother with it

Cox Proportional Hazard model

Cox Proportional Hazard model

Cox Proportional Hazard model

Cox Proportional Hazard model

Cox Proportional Hazard model

Cox Proportional Hazard model

Machine
Learning

Machine
Learning

Machine
Learning

Machine
Learning

Machine
Learning

Machine
Learning

Machine
Learning

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes

f: MoDel's Probability outputs
↦ observed outcomes