Data Scientist @HaystaxTech, Ph.D. Candidate @UNLV, Bayesian Machine Learning Researcher, Organizer of Data Science Meetups. User of #PyMC3.
Capturing Latent Patterns With Unknown Number of Activities
Department of Civil Engineering
University of Nevada Las Vegas
2017 TRB Data Analytics Contest
Our Research Interest
- However, Activity-Based travel patterns from these data are not clear & obvious e.g.
- activities of callers?
- periodic variations of travel?
- distinct behavioral clusters of travelers?
Call Detail Records (CDR):- from cell phones
- Currently, transport planners are looking into alternative datasets for transport planning e.g.
- GPS, social media, Bluetooth, cellphone, etc.
- An Origin-Destination matrix with:
- What data were we given?
- What was our objective?
- To capture the activities (trip purpose), in an unsupervised way?
- Origin zone
- Destination zone
- Trip purpose
- Start day of trip
- End day of trip
- Hour period of trip
- Class of traveler e.g Home worker
Related Work & Limitation
- To capture these latent patterns, the current state-of-the-art has proposed parametric models, e.g.
1. Hierarchical Hidden Semi-Markov Model (Baratchi et al., 2014)
2. Topic Model (Farrahi and Gatica, 2014)
4. Hidden Semi Markov Model (Paiement et al., 2015)
3. Relational Markov Network (Widhalm et al., 2015)
- simple models may under-represent the complexity in the data
- Overly complex models may overfit to data & are expensive to compute.
- Hierarchical Segmented Infinite Hidden Markov Model (H-siHMM)
- Proposed a Bayesian nonparametric approach to develop a flexible statistical model
Non-parametric models make weaker assumptions that permit their complexity to grow with the size and complexity of observed data
- Place a Dirichlet Process (DP) prior on the number of mixing components
- Used trip counts on weekdays in destination zone “977”
- All the trips for the same hourly period were gathered together
- Resulting in 24 time steps with several observations of trip counts
- We needed fast Bayesian inference:- MCMC sampling using Hamiltonian Monte Carlo
Preliminary Results (1/3)
- The proposed model captured 3 hidden activities (labeled “A1”, “A2”, and “A3” in figures below)
They match ground truth activities “Other”, “At home” and “Other”, respectively.
Preliminary Results (2/3)
- Graph for periods T5 upto T23
Preliminary Results (3/3)
- Statistical properties of captured activities
- mu = location/center
- sigma = standard deviation
- theta = probability of transition between activities
On-going & Future Work
- At least four on-going improvements:
- Use disaggregated data
- Model several regions simultaneously
- Add other rich domain information, e.g. land use
- Add models for weekends
By Daniel Emaasit