Towards replicable mode choice models for transport simulations in France 

Sebastian Hörl

21 June 2023

ISTDM 2023

Introduction

  • Reproducibility
    • Low in transport modelling / simulation, especially with agent-based models
    • Can increase acceptance, uptake and more widespread use of these models
       
  • Increasingly available open data sources make reproducibility possible, but processes aren't standardized or not easily accessible as open source
     
  • Our goal: Have pipeline from raw data to a calibrated large-scale agent-based transport simulation that is nearly 100% replicable with reproducible results.

Context

  • Existing synthetic demand process
    • Open-source pipeline
    • Based on open data in France
    • Reference implementation for Paris and Île-de-France region
       
  • Adaptations for various places in France by multiple stakeholders

Context

  • Existing synthetic demand process
    • Open-source pipeline
    • Based on open data in France
    • Reference implementation for Paris and Île-de-France region
       
  • Adaptations for various places in France by multiple stakeholders
     
  • Compatible with MATSim simulation

Context

  • Problem
    • Choice model used in the simulation is not reproducible, hard-coded
    • Based on a model for Zurich that has been recalibrated
       
  • Challenge
    • Use available data to estimate a new choice model for Île-de-France
    • Generalizable to other use cases
    • Use as much open data as possible
       
  • Synthesis pipeline already processes, cleans, harmonizes many HTS data sets in France

?

General process

  • Formalize the data processing and estimation process
  • Focus on open data sets vs. proprietary APIs

General process: Cleaning

  • Harmonize French HTS data into the same format
    • Enquête Globale de Transport (Île-de-France)
      • Available upon request
      • Paris / Île-de-France
      • 2010/2011, new version coming up
    • Enquête Nationale Transport et Déplacements (France)
      • Open data
      • All France
      • 2008/2009
    • Various semi-standardized surveys designed by CEREMA
      • Enquête Déplacement Grand Territoire
      • Some open data (Nantes, Lille , ...)
      • Others available upon request
         
  • Yielding connected households, persons, trips and legs

General process: Spatialization

  • Most HTS do not provide detailed trip origin and destination information (exception EGT)

General process: Spatialization

  • Most HTS do not provide detailed trip origin and destination information (exception EGT)
     
  • Possible to impute likely locations based on
    • Euclidean distance between origins and destinations along the trip chain
    • Identifiers of origin and destination zones
    • Shapes of origin and destination zones
       
  • Balac, M., Hörl, S., Schmid, B., 2022. Discrete choice modeling with anonymized data. Transportation.

General process: Road routing

  • Plenty of APIs available (HERE, Bing, Google, TomTom, ...)
     
  • Goal: Use open data and make process very easy to use
     
  • Based on a OpenStreetMap dump (for instance, from Geofabrik)
     
  • Based on osmnx library in Python
     
  • Problem: Only speed limits are known

General process: Road routing

  • We use open information from the TomTom Traffic Index to inflate OSM travel times to realistic ones

Source: TomTom Traffic Index Paris

General process: Road routing

  • We use open information from the TomTom Traffic Index to inflate OSM travel times to realistic ones
     
  • Uniform adjustment of the factors based on travel times in the HTS

General process: Transit routing

  • Same idea: Avoid the use of APIs, allow for local processing
     
  • Based on GTFS data (usually available in France), sometimes from different periods
     
  • Routing of the trips using the RAPTOR algorithm (standalone implementation in MATSim)
     
  • Problem: How to choose the routing parameters?

General process: Transit routing

  • For calibration, we only look at transit trips in the HTS
     
  • We adjust the routing parameters:
    • Utility of transfer
    • Utility of travel time per mode (bus, tram ...)
       
  • Using CMA-ES blackbox optimization
     
  • Objective:
    • Distribution of transfers (0, 1, 2, 3+)
    • Mode share of transit modes

General process: Transit routing

  • Optimization using CMA-ES

General process: Transit routing

  • Fit of the distributions
    • Baseline: Minimize travel time (-1 u/h and -1 u/transfer

General process: Additional components

  • Parking pressure based on open data
    • Registered vehicles in zone divided by accessible road network

General process: Cost structure

  • Need to make hypotheses on the costs (for 2010)
    • Car: 20 ct/km
    • Parking: 3 EUR/h (Paris 2010, based on duration of following activity)
    • Public transport: Per ticket or per duration
       
  • Special case for Île-de-France / Paris
    • For free if person has public transit subscription (person attribute)
    • 1.80 EUR for trip within Paris or only us or metro
    • Otherwise, regression model for regional tickets (Abdelkader DIB, IFPen)

Distances: OP = Origin > Paris; DP = Destination > Paris; D = Direct

Model structure

Model structure

Model structure

Model estimation

  • Using Biogeme's Python interface
     
  • 18 parameters
  • R2 = 0.53

Simulation

  • Model has been implemented in the simulation, so we have first results
  • Currently, calibrating network parameters

Conclusion and outlook

  • First prototype of the pipeline works for Île-de-France
  • Currently packaging up the code and preparing a paper on the baseline case





     
  • Model improvements
    • Improve simplified travel time estimation
    • Integrate walking and bicycle routing (but few data available in 2010)
    • Include weather information, generally complexify model formulation
       
  • Porting to other areas in France
    • Path 1: Compare models for different areas, hopefully they are similar
    • Path 2: Estimate a joint model for France with (ideally non-significant) regional dummies

Questions?