Our team should check for bugs and usability issues in android app
# Engineer the churn column
today = pd.Timestamp('20140701')
days_delta = pd.Timedelta('30 days 00:00:00')
copy['days_since_last_used'] = today - copy['date_last_trip']
copy['churn'] = copy['days_since_last_used'] > days_delta
copy['days_since_signup'] = today - copy['date_signup']
copy['new'] = copy['days_since_signup'] < days_delta
# Remove the column from which the solution came (stop leakage)
copy = copy.drop(['date_last_trip', 'days_since_last_used',
'last_trip_date', 'signup_date', 'date_signup',
'days_since_signup', 'new'], axis=1)
We examined 5 classifiers:
f1 score: 0.839278289993
params: {
'max_features': 'sqrt',
'n_estimators': 1000,
'learning_rate': 0.05,
'max_depth': 4
}
f1_score -> 0.838952585961
r2_score -> 0.124212558345
precision_score -> 0.813991763592
accuracy_score -> 0.793792696298
recall_score -> 0.86550548888
mean_squared_error -> 0.206207303702
roc_auc_score -> 0.771004947189
avg_dist 0.1824
weekday_pct 0.163
trips_in_first_30_days 0.1386
surge_pct 0.1141
avg_surge 0.1007
avg_rating_by_driver 0.0906
avg_rating_of_driver 0.0767
city_Kings Landing 0.0271
luxury_car_user 0.0248
city_Astapor 0.0193
phone_Android 0.0135
phone_iPhone 0.0134
city_Winterfell 0.0131
rated_driver_False 0.012
rated_driver_True 0.0106
f1_score -> 0.831244719975
r2_score -> 0.0743100110721
precision_score -> 0.806983691001
accuracy_score -> 0.78287255563
recall_score -> 0.857043512597
mean_squared_error -> 0.21712744437
roc_auc_score -> 0.758442903427
avg_dist 0.2377
weekday_pct 0.1394
trips_in_first_30_days 0.1231
surge_pct 0.1008
avg_surge 0.1004
avg_rating_by_driver 0.0889
avg_rating_of_driver 0.0821
city_Kings Landing 0.0247
luxury_car_user 0.0226
city_Astapor 0.0178
phone_iPhone 0.0142
phone_Android 0.014
city_Winterfell 0.0138
rated_driver_False 0.0108
rated_driver_True 0.0098