CityBikes

Patrick Merlot (Feb. 2016)

Data Scientist intern @ iKnow Solutions Norge AS

Forecasting bikes/docks shortage

Optimal path to imbalanced stations

The Project

The Goal
Facilitate the maintenance of
a bike-sharing system

The Project

Key Metrics — What to improve?

  • Shortage duration @ station
  • % bikes transported by truck
  • User satisfaction

Bikes/Docs availability @ Market at Sansome

The Project

Business levers — How to improve?

  • Forecasting the next
    imbalanced stations


     
  • Calculate an
    optimized path
    for replenishment

Visual & Intuitive
Dashboard

Data Analysis Process

DATA COLLECTION

MODELING

EXPLORATORY ANALYSIS

MODELING

RESULTS/VISUALIZATION

DATA PREPARATION

Data Analysis Process

DATA COLLECTION

  • Single data source
  • Historical data:

    • year 1: Aug. 2013 - Aug. 2014

    • year 2: Sep. 2014 - Aug. 2015

      • station data (5Kb): name, id, coordinates, #docks

      • weather data (155Kb): daily temp./precipitations/...

      • trip data (42Mb): tripID, start/stop date/station, userID, ...

      • hist. status data (1.1Gb): #freeDocks, #freeBikes /min.

  • ​​ Realtime time data (every minute): status data (JSON format)

Data Analysis Process

DATA COLLECTION

DATA PREPARATION

  • Clean data. Easier than the Titanic dataset!
  • Just be careful with timezones!
  • Using years 1&2 > different data field names

Data Analysis Process

DATA COLLECTION

EXPLORATORY ANALYSIS

DATA PREPARATION

Visualization libraries

Data Manipulation
Statistics

Scientific computing

Machine Learning

Data Analysis Process

DATA COLLECTION

MODELING

EXPLORATORY ANALYSIS

MODELING

DATA PREPARATION

  • build a model (features, target values)
  • choice of regression estimators
    • Linear regression
    • Decision Tree Regressor
    • Gradient Boosting Regressor
    • Random Forest Regressor
    • ...
  • testing/scoring/validation

Data Analysis Process

DATA COLLECTION

MODELING

EXPLORATORY ANALYSIS

MODELING

RESULTS/VISUALIZATION

DATA PREPARATION

  • Assess the result
  • ​Iterative process to improve predictions
  • Compute a "Confidence Interval" 

CityBikes appStack

Producer

Consumer

Machine
Learning

webApp

Kafka cluster

bike
arriving

bike
leaving

status

Filter

Aggregate

Build model

Train

Predict

REST api

Dashboard

Map/Path

Trips

Station

Weather

 

What's Next?

  • More statistics & ML
  • Master the tools
  • Check this blog post!
    Almost there :)

Thanks Dirk

Thank you all ;-)

CityBikes, Forecasting maintenance of a bike-sharing system

By Patrick Merlot

CityBikes, Forecasting maintenance of a bike-sharing system

  • 305