Who am I?


Bayesian Methods for Hackers


Open source textbook on an intro to Bayesian Methods
in Python!




lifelines



Survival Analysis in Python






lifelines is a Python library, developed here at Shopify and later open-sourced, to measure durations.

Durations-tf?


  • Historically, survival analysis was developed and used by actuaries and medical researchers to measure the lifetimes of populations. 
  • "What is the expected lifetime of patients given drug A? Drug B?"
  • "What is the life-expectancy of a baby born today in Canada?"


These researchers wanted to measure the duration between Birth and Death.



Censorships



  • It's not that simple. We don't always see death event occur -- the current time, or other events, censor us from seeing the death event.

  • "Not all patients have died, how can I use that data?"
  • "How can I measure population lifetimes when most of my population hasn't died yet?"



Modern Survival Analysis

  1. Birth: Customer joins Shopify                                                                      Death: Customer leaves Shopify                                                        Censorship: current time censors seeing all cancelations. 
  2. Birth: Leader forms government                                                              Death: government dissolves                                                    Censorship: current time disallows seeing all dissolvements. 
  3. Birth: Couple starts dating                                                                          Death: couple breaks-up                                                                    Censorship: some couples never break-up (partner's death comes first)
  4. Birth:  senator enter's office                                                                      Death: senator retires                                                                              Censorship: senators can die before retiring. 





The main application is constructing the Survival Curve



Example:










Math Time. 

Survival Curve




T          is the lifetime of a member of the population (Random)

t            denotes time

S(t)  is the survival curve at time t

S(t) == P( T > t)

(eg: what is the probability that the individual lives longer than t?)

Survival Curve





Completely defines the population's distribution of lifetimes.
  If I know the survival curve, I know everything.




Survival Curve






But I don't know the survival curve =P







Make IPython go now

Hazard Curve






S(t) = e ^ { -H(t) }




(sorry for the ugly equation)

Hazard Curve



Recall how I said:

If I know the survival curve, I know everything


Equivalently,


If I know the hazard curve, I know everything








Make IPython go now

Recall:






S(t) = e ^ { -H(t) }







What if I had more data?




  1. Someone is wanted for murder - does this play a larger role in them being caught? 
  2. Maybe someone is young, or old? How does this affect their chances of being caught?


Survival Regression



x = (age, crime, ...)


S(t| x) = e ^ { -H(t | x) }

Two models

Aalen's Model




x = (x1, x2, ... )

H( t | x ) = b1(t)*x1 + b2(t)*x2 + ...





Two models

Cox's Model




x = (x1, x2, ... )

H( t | x ) = b(t)*e ^ { b1*x1 + b2*x2 ... }










Make IPython go now.

What else does lifelines have?



  1. Statistical test (p-values and that stuff)
  2. Cross-validation for model selection
  3. Utils for transforming lifetables into durations
  4. Artificial data generating library (for testing and debugging methods)










Thanks!  =)

lifelines - Survival Analysis in Python

By Cam DP

lifelines - Survival Analysis in Python

  • 1,678