New York Taxi Cabs - F
Knowledge Discovery
Data Understanding
Yellow - 13587
Green - 6000
Summary statistics
clip outliers outside 1.5IQR
Hypotheses
H1: Predict future congestion to optimize infrastructure spending.
H2: Determine possible routes optimized for ride-sharing capabilities and find good shared pickup points
Data Preperation
- Location ID clarification
- Date standardization
- Removal of unnecessary variables
- Remove outliers
- Combine green & yellow datasets
- Rename similar variables
- Check similarity between 2015-2016
Modelling
H1: Identifying highly congested areas during rush hours.
Forecast time series on these “hot spots” to predict probability of future congestion.
H2: K-means cluster analysis to locate common convenient pick up/ drop off locations minimizing walking distance to less than 500m from one location to another.
Evaluation
- Compare future with current congestion (1)
(ROC curve, Precision/Recall)
- A stability-based model would be used to identify how many pick up/drop off
locations will be needed.
- Total distance traveled by all taxis should be lower when taking into account ride sharing.
- Compare mean number of passengers
H1
H2
(1) http://www.pnas.org/content/114/3/462.abstract
Sources
knowledgediscovery1
By laurenstc
knowledgediscovery1
- 524