H1: Predict future congestion to optimize infrastructure spending.
H2: Determine possible routes optimized for ride-sharing capabilities and find good shared pickup points
Data Preperation
- Location ID clarification
- Date standardization
- Removal of unnecessary variables
- Remove outliers
- Combine green & yellow datasets
- Rename similar variables
- Check similarity between 2015-2016
Modelling
H1: Identifying highly congested areas during rush hours.
Forecast time series on these “hot spots” to predict probability of future congestion.
H2: K-means cluster analysis to locate common convenient pick up/ drop off locations minimizing walking distance to less than 500m from one location to another.
Evaluation
- Compare future with current congestion (1)
(ROC curve, Precision/Recall)
- A stability-based model would be used to identify how many pick up/drop off
locations will be needed.
- Total distance traveled by all taxis should be lower when taking into account ride sharing.