Real-time pre-authorization in ride-hailing: switching from rules to ML

Andrey Lukyanenko

Senior DS @ Careem

About me

  • ~4 years as ERP-system consultant
  • DS since 2017
  • Lead a medical chatbot project
  • Lead an R&D CV team
  • Senior DS in Careem: anti-fraud, recommendation system, LLM-based products

A slide about Careem

Content

  • Pre-authorization: what is it?
  • Why did we decide to switch from rules to ML?
  • Challenges of building the model
  • Data Preparation
  • Model training
  • Model deployment
  • Results

Pre-authorization

Rules vs ML models

Challenges of building the model

  • Determining the actual metrics: if the new model denied the transaction, we don't know if it was fraudulent
  • Prediction threshold optimization
  • Latency

Data Preparation

  • How much historical data to use?
  • Feature engineering and selection
  • Ensure the lack of leakage and don't use the data not available at the moment of making predictions
  • Check data for discrepancies

Model training

Model deployment

  • Preparing features to be available in real-time
  • Checking for discrepancies between training and production data
  • Internal system for model training and deployment on AWS
  • Running the model in shadow mode
SELECT user, COUNT(*)
  FROM user_trips
 WHERE trip_type = "special"
   AND day = {date}
 GROUP BY user

Results

Contacts

pre_auth

By Andrey Lukyanenko

pre_auth

  • 53