(survival analysis in Python)
According to Google and every business article ever:
1. According to numbers presented in a news release that reported Vodafone Group Plc's results for the six months ended September 2012, Vodafone UK as 10.8 million "pay monthly" customers
2. In his "Q3 2012 Earnings Conference Call", the CFO of Amazon commented that "active customer accounts exceeded 188 million", where customers are considered active when they have placed an order during the preceding twelve-month period.
Contractual: we observe when the customers dies.
Non-contractual: the time at which a customer dies is unobserved.
1. Let's assume customers will continuing buying from us until they "die".
2. Their death rate is constant (but different constants across individuals).
3. Assume exponential time between purchases (with different rates across individuals)
Call this the Pareto model
Graphically:
1.
2.
3.
4.
Solution: use existing data/observations to infer these parameters. Nice part - we only need a few variables to do this.
Age: how old is the customer?
Frequency: how many purchases did they make?
Recency: how old was the customer at their last purchase?
1. Set up the likelihood
2. find the maximum using numerical methods
3. push the maximum point back into the distributions
Lifetimes implements this part.
penalizer_coef = 0.01
bgf = ParetoNBDFitter(penalizer_coef=penalizer_coef)
bgf.fit(rfm_customers['frequency'], rfm_customers['recency'], rfm_customers['t'])
print bgf
# <lifetimes.ParetoNBDFitter: fitted with 40063 subjects, a: 0.34, alpha: 945.35, b: 0.39, r: 0.19>
print model.conditional_probability_alive(frequency=0,recency=0,T=109)
# 0.46701904847776876
1. Let's assume customers will continuing buying from us until they "die".
2. After each additional purchase they have a p-percent chance of dieing (p is unique to the individual).
3. Assume exponential time between purchases.
Call this the BG/NDB model
Pretty good! Seasonal sales like Christmas are only a blip on the cumulative total number of orders. This might "reawaken" dead customers though, which is not part of our model.
Absolutely - this is almost desirable when you are dealing with small businesses who lack lots of data - you want to see the uncertainty in your estimates.
Dan W. at Pass the ROC does a great job of this:
http://danielweitzenfeld.github.io/passtheroc/
Ex: instead of point estimates of λ, I want a distribution of what λ might be.