Capturing Latent Patterns With Unknown Number of Activities

Daniel Emaasit

PhD Student

Department of Civil Engineering

University of Nevada Las Vegas


2017 TRB Data Analytics Contest

Our Research Interest

  • However, Activity-Based travel patterns from these data are not clear & obvious e.g.
  • activities of callers?
  • periodic variations of travel?
  • distinct behavioral clusters of travelers?

Call Detail Records (CDR):- from cell phones

  • Currently, transport planners are looking into alternative datasets for transport planning e.g.

- GPS, social media, Bluetooth, cellphone, etc.

Our Objective

  • An Origin-Destination matrix with:
  • What data were we given?
  • What was our objective?
  • To capture the activities (trip purpose), in an unsupervised way?
  1. Origin zone
  2. Destination zone
  3. Trip purpose
  4. Start day of trip
  5. End  day of trip
  6. Hour period of trip
  7. Class of traveler e.g Home worker

Related Work & Limitation

  • To capture these latent patterns, the current state-of-the-art has proposed parametric models, e.g.

1. Hierarchical Hidden Semi-Markov Model (Baratchi et al., 2014)

2. Topic Model (Farrahi and Gatica, 2014)

4. Hidden Semi Markov Model (Paiement et al., 2015)

  • Limitation: They impose a priori bounds on the model complexity (i.e. the number of latent activities are pre-specified)

3. Relational Markov Network (Widhalm et al., 2015)

  • simple models may under-represent the complexity in the data
  • Overly complex models may overfit to data & are expensive to compute.


... etc

Proposed Solution

  • Hierarchical Segmented Infinite Hidden Markov Model (H-siHMM)
  • Proposed a Bayesian nonparametric approach to develop a flexible statistical model

Non-parametric models make weaker assumptions that permit their complexity to grow with the size and complexity of observed data

- Place a Dirichlet Process (DP) prior on the number of mixing components

Empirical Analysis

  • Used trip counts on weekdays in destination zone “977”
  • All the trips for the same hourly period were gathered together


  • Resulting in 24 time steps with several observations of trip counts
  • We needed fast Bayesian inference:-  MCMC sampling using Hamiltonian Monte Carlo 

Preliminary Results (1/3)

  • The proposed model captured 3 hidden activities (labeled “A1”, “A2”, and “A3” in figures below)

They match ground truth activities “Other”, “At home” and “Other”, respectively.

Preliminary Results (2/3)

  • Graph for periods T5 upto T23

Preliminary Results (3/3)

  • Statistical properties of captured activities
  • mu = location/center
  • sigma = standard deviation
  • theta = probability of transition between activities

On-going & Future Work


  •  At least four on-going improvements:
    • Use disaggregated data
    • Model several regions simultaneously
    • Add other rich domain information, e.g. land use
    • Add models for weekends

Thank You


By Daniel Emaasit


2017 Transportation Analytics Contest

  • 288
Loading comments...

More from Daniel Emaasit