Modern

 Survival Analysis

Cam Davidson-Pilon

PyData Piraeus Meetup 🎉

July 31, 2020

statistics

statistics

Machine
Learning

statistics

Machine
Learning

Classification

statistics

Machine
Learning

Classification

Regression

statistics

Machine
Learning

Classification

Regression

Causal Inference

statistics

Machine
Learning

Classification

Regression

Causal Inference

Dim. Reduction
....

statistics

Machine
Learning

Classification

Regression

Causal Inference

Dim. Reduction
....

Survival Analysis

statistics

Machine
Learning

Classification

Regression

Causal Inference

Dim. Reduction
....

Survival Analysis

What is survival Analysis?

What is survival Analysis?

Measuring the time between events*

What is survival Analysis?

Measuring the time between events*

*Later, we will generalize this

  • Time between contracting a disease, and death.
  • Time between contracting a disease, and death.
  • How long a politician is in office for.
  • Time between contracting a disease, and death.
  • How long a politician is in office for.
  • Time between a user sign up and the user leaving.
  • Time between contracting a disease, and death.
  • How long a politician is in office for.
  • Time between a user sign up and the user leaving.
  • How long a packet takes to travel from data center to data center.
  • Time between contracting a disease, and death.
  • How long a politician is in office for.
  • Time between a user sign up and the user leaving.
  • How long a packet takes to travel from data center to data center.
  • The lifespan of an organism.

Survival Analysis

Regression

Survival Analysis

  • duration should be non-negative

Regression

Survival Analysis

  • duration should be non-negative

Regression

  • not necessarily, but you could transform the output to a positive value

Survival Analysis

  • duration should be non-negative
  • the event may or may not occur

Regression

  • not necessarily, but you could transform the output to a positive value

Survival Analysis

  • duration should be non-negative
  • the event may or may not occur

Regression

  • not necessarily, but you could transform the output to a positive value
  • ???

Survival Analysis

  • duration should be non-negative
  • the event may or may not occur

Classification

Survival Analysis

  • duration should be non-negative
  • the event may or may not occur

Classification

  • naturally models if an event occurs.

Survival Analysis

  • duration should be non-negative
  • the event may or may not occur

Classification

  • naturally models if an event occurs.
  • But throws out important information about how long it took to occur (or not)

Pieces of Survival Analysis

Survival Function

Survival Function

S(t) := P(T > t)

Hazard function

Hazard function

h(t):= \lim_{\delta t \rightarrow 0 } \; \frac{Pr( t \le T \le t + \delta t | T > t)}{\delta t}

Hazard function

h(t):= \text{Probability you die this instant,} \\ \text{given you haven't died by time $t$}

Hazard function

h(t) = -\frac{S'(t)}{S(t) }

Hazard function

S(t) = \exp{\left(-\int_0^t h(s) ds\right)}

Cumulatuve Hazard function

S(t) = \exp{\left(-H(t)\right)}

Cumulatuve Hazard function

S(t) = \exp{\left(-H(t)\right)}
H(t) := \int_0^t h(s) ds

In survival analysis, you're modelling one of these

Let's model the survival function

Data -> Kaplan-Meier 

T E
60 1
60 1
60 0
15 0
69 1
45 1
17 1
48 1
60 1
... ...

Kaplan Meier is non-parametric

As are:

Kaplan Meier is non-parametric

As are:

  • mean
  • median
  • empirical CDF

Kaplan Meier is non-parametric

As are:

  • mean
  • median
  • empirical CDF
  • random forests / decision trees

Modern survival

Analysis Tip #1

Regression typically models the cumulative hazard

Regression typically models the cumulative hazard

S(t) = \exp{\left(-H(t)\right)}
H(t) := \int_0^t h(s) ds
H(t\;|\;x) = f(t, \theta(x))
H(t\;|\;x) = f(t, \theta(x))
H(t\;|\;x) = \frac{t}{\lambda(x)}
H(t\;|\;x) = f(t, \theta(x))
H(t\;|\;x) = \left(\frac{t}{\lambda(x)}\right)^{\rho(x)}
H(t\;|\;x) = \frac{t}{\lambda(x)}
H(t\;|\;x) = f(t, \theta(x))
H(t\;|\;x) = \left(\frac{t}{\lambda(x)}\right)^{\rho(x)}
H(t\;|\;x) = \frac{t}{\lambda(x)}
H(t\;|\;x) = \text{NN}(t, W(x))
H(t\;|\;x) = f(t, \theta(x))
H(t\;|\;x) = \left(\frac{t}{\lambda(x)}\right)^{\rho(x)}
H(t\;|\;x) = \frac{t}{\lambda(x)}
H(t\;|\;x) = \text{NN}(t, W(x))
H(t\;|\;x) = \frac{1}{B}\sum_{b=1}^B H_b(t, x)

Why model the cumulative hazard?

  • Independent causes of death are additive on the cumulative hazard scale.

Why model the cumulative hazard?

  • Independent causes of death are additive on the cumulative hazard scale.
S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)

Why model the cumulative hazard?

  • Independent causes of death are additive on the cumulative hazard scale.
S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)
  • Differentiating (to recover the hazard) is many times easier than integrating the hazard (to recover the CHF).

Why model the cumulative hazard?

  • Independent causes of death are additive on the cumulative hazard scale.
S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)
  • Differentiating (to recover the hazard) is many times easier than integrating the hazard (to recover the CHF).

Why model the cumulative hazard?

  • Independent causes of death are additive on the cumulative hazard scale.
S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)
  • Differentiating (to recover the hazard) is many times easier than integrating the hazard (to recover the CHF).

👍

Why model the cumulative hazard?

  • Independent causes of death are additive on the cumulative hazard scale.
S(t) = S_1(t) S_2(t) \Leftrightarrow H(t) = H_1(t) + H_2(t)
  • Differentiating (to recover the hazard) is many times easier than integrating the hazard (to recover the CHF).

👍

👎

"I've heard of the cox proportional model - what's that?"

Cox Proportional Hazard model

  • Used because it's semi-parametric

Cox Proportional Hazard model

  • Used because it's semi-parametric
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Cox Proportional Hazard model

  • Used because it's semi-parametric
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Non-parametric baseline hazard

Cox Proportional Hazard model

  • Used because it's semi-parametric
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Non-parametric baseline hazard

Parametric scalar

Cox Proportional Hazard model

  • Used because it's semi-parametric
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Non-parametric baseline hazard

Parametric scalar

  • the non-parametric part is nice: "it's makes less assumptions about the form"

Modern survival

Analysis Tip #2

Cox Proportional Hazard model

  • Used because it's semi-parametric
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Non-parametric baseline hazard

Parametric scalar

Don't bother with it

  • the non-parametric part is nice: "it's makes less assumptions about the form"

Cox Proportional Hazard model

  • the non-parametric part is nice: "it's makes less assumptions about the form"

Cox Proportional Hazard model

  • the non-parametric part is nice: "it's makes less assumptions about the form"
  • But it carries a lot of hidden, and quite strict, assumptions. 

Cox Proportional Hazard model

  • the non-parametric part is nice: "it's makes less assumptions about the form"
  • But it carries a lot of hidden, and quite strict, assumptions. 
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Cox Proportional Hazard model

  • the non-parametric part is nice: "it's makes less assumptions about the form"
  • But it carries a lot of hidden, and quite strict, assumptions. 
  • The coefficients are non-collapsible.
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Cox Proportional Hazard model

  • the non-parametric part is nice: "it's makes less assumptions about the form"
  • But it carries a lot of hidden, and quite strict, assumptions. 
  • The coefficients are non-collapsible.
  • The coefficients cannot be interpreted causally.
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Cox Proportional Hazard model

  • the non-parametric part is nice: "it's makes less assumptions about the form"
  • But it carries a lot of hidden, and quite strict, assumptions. 
  • The coefficients are non-collapsible.
  • The coefficients cannot be interpreted causally.
  • Prediction is less efficient than other methods. 
h(t\;|\;x) = h_0(t) \exp{\left(\mathbf{\beta} x\right)}

Calibration of Survival Models

Calibration of Survival Models

Modern Survival Analysis tip #3

f: MoDel's Probability outputs
         ↦ observed outcomes

We discretize the [0,1] interval, put the probabilities into bins, and average them. 

f: MoDel's Probability outputs
         ↦ observed outcomes

\frac{1}{n}\sum_{\text{clf}(x_i) \in B} \text{clf}(x_i) \approx \frac{1}{n} \sum_{x_i \in B} y_i

We discretize the [0,1] interval, put the probabilities into bins, and average them. We hope that:

f: MoDel's Probability outputs
         ↦ observed outcomes

\frac{1}{n}\sum_{\text{clf}(x_i) \in B} \text{clf}(x_i) \approx \frac{1}{n} \sum_{x_i \in B} y_i

This is like creating a new prediction function, F, that maps model predictions to outcomes.

We discretize the [0,1] interval, put the probabilities into bins, and average them. We hope that:

F(x) = \frac{1}{n}\sum_{\text{clf}(x_i) \in B(\text{clf}(x))} \text{clf}(x_i)

f: MoDel's Probability outputs
         ↦ observed outcomes

The binning approach is coarse and drops information.

 

f: MoDel's Probability outputs
         ↦ observed outcomes

The binning approach is coarse and drops information.

One could use a parametric model:

y_i \approx \text{Logit}(\text{clf}(x_i))

f: MoDel's Probability outputs
         ↦ observed outcomes

The binning approach is coarse and drops information.

One could use a parametric model:

y_i \approx \text{Logit}(\text{clf}(x_i))

More flexible the better (competing with a non-parametric histogram)

f: MoDel's Probability outputs
         ↦ observed outcomes

The binning approach is coarse and drops information.

One could use a parametric model:

y_i \approx \text{Logit}(\text{clf}(x_i))

More flexible the better (competing with a non-parametric histogram)

 

Apply to same idea to survival model calibrations.

f: MoDel's Probability outputs
         ↦ observed outcomes

Given a fixed time, t, we output probabilities of subjects being alive. 

f: MoDel's Probability outputs
         ↦ observed outcomes

Given a fixed time, t, we output probabilities of subjects being alive. 

We need to connect these probabilities, p, to realized data (T, E).

f: MoDel's Probability outputs
         ↦ observed outcomes

Given a fixed time, t, we output probabilities of subjects being alive. 

We need to connect these probabilities, p, to realized data (T, E). 

 

We can use a highly flexible parametric survival model. 

Do I have time for

Modern Survival Analysis tip #4?

 Summary metrics 

Metrics like the hazard ratio (from Cox model), or log-rank, are hard to interpret.

Metrics like the hazard ratio (from Cox model), or log-rank, are hard to interpret.

 

Plus they tell you little about the future - how many years do I have left to live?

 Summary metrics 

Median survival time is okay - but doesn't always exist. 

 Summary metrics 

Restricted Mean Survival Time is the new standard. (RMST)

 Summary metrics 

Restricted Mean Survival Time is the new standard. (RMST)

 Summary metrics 

\text{RMST}(t) = \int_0^t S(s) ds

In conclusion

  1. Use KMunicate-style for presenting Kaplan-Meier results

In conclusion

  1. Use KMunicate-style for presenting Kaplan-Meier results
  2. Skip the Cox Proportional Hazard model (if you can)

In conclusion

  1. Use KMunicate-style for presenting Kaplan-Meier results
  2. Skip the Cox Proportional Hazard model (if you can)
  3. Use survival probability calibration plots

In conclusion

  1. Use KMunicate-style for presenting Kaplan-Meier results
  2. Skip the Cox Proportional Hazard model (if you can)
  3. Use survival probability calibration plots
  4. RMST (restricted mean survival times) are often preferable to other summary metrics. 

Questions?

Software

Python:

  • lifelines
  • scikit-survival
  • pycox

 

R

  • survival
  • flexsurvreg

Modern Survival Analysis

By Cam DP

Modern Survival Analysis

  • 285