
Towards replicable mode choice models for transport simulations in France
Sebastian Hörl
21 June 2023
ISTDM 2023
Introduction

- Reproducibility
- Low in transport modelling / simulation, especially with agent-based models
- Can increase acceptance, uptake and more widespread use of these models
- Increasingly available open data sources make reproducibility possible, but processes aren't standardized or not easily accessible as open source
- Our goal: Have pipeline from raw data to a calibrated large-scale agent-based transport simulation that is nearly 100% replicable with reproducible results.

Context

- Existing synthetic demand process
- Open-source pipeline
- Based on open data in France
- Reference implementation for Paris and Île-de-France region
- Adaptations for various places in France by multiple stakeholders














Context

- Existing synthetic demand process
- Open-source pipeline
- Based on open data in France
- Reference implementation for Paris and Île-de-France region
- Adaptations for various places in France by multiple stakeholders
- Compatible with MATSim simulation

Context

-
Problem
- Choice model used in the simulation is not reproducible, hard-coded
- Based on a model for Zurich that has been recalibrated
-
Challenge
- Use available data to estimate a new choice model for Île-de-France
- Generalizable to other use cases
- Use as much open data as possible
- Synthesis pipeline already processes, cleans, harmonizes many HTS data sets in France

?
General process

- Formalize the data processing and estimation process
- Focus on open data sets vs. proprietary APIs

General process: Cleaning

-
Harmonize French HTS data into the same format
- Enquête Globale de Transport (Île-de-France)
- Available upon request
- Paris / Île-de-France
- 2010/2011, new version coming up
- Enquête Nationale Transport et Déplacements (France)
- Open data
- All France
- 2008/2009
- Various semi-standardized surveys designed by CEREMA
- Enquête Déplacement Grand Territoire
- Some open data (Nantes, Lille , ...)
- Others available upon request
- Enquête Globale de Transport (Île-de-France)
- Yielding connected households, persons, trips and legs
General process: Spatialization

- Most HTS do not provide detailed trip origin and destination information (exception EGT)

General process: Spatialization

- Most HTS do not provide detailed trip origin and destination information (exception EGT)
- Possible to impute likely locations based on
- Euclidean distance between origins and destinations along the trip chain
- Identifiers of origin and destination zones
- Shapes of origin and destination zones
-
Balac, M., Hörl, S., Schmid, B., 2022. Discrete choice modeling with anonymized data. Transportation.

General process: Road routing

- Plenty of APIs available (HERE, Bing, Google, TomTom, ...)
-
Goal: Use open data and make process very easy to use
- Based on a OpenStreetMap dump (for instance, from Geofabrik)
- Based on osmnx library in Python
- Problem: Only speed limits are known

General process: Road routing

- We use open information from the TomTom Traffic Index to inflate OSM travel times to realistic ones

Source: TomTom Traffic Index Paris

General process: Road routing

- We use open information from the TomTom Traffic Index to inflate OSM travel times to realistic ones
- Uniform adjustment of the factors based on travel times in the HTS


General process: Transit routing

- Same idea: Avoid the use of APIs, allow for local processing
- Based on GTFS data (usually available in France), sometimes from different periods
- Routing of the trips using the RAPTOR algorithm (standalone implementation in MATSim)
- Problem: How to choose the routing parameters?

General process: Transit routing

- For calibration, we only look at transit trips in the HTS
- We adjust the routing parameters:
- Utility of transfer
- Utility of travel time per mode (bus, tram ...)
- Using CMA-ES blackbox optimization
- Objective:
- Distribution of transfers (0, 1, 2, 3+)
- Mode share of transit modes

General process: Transit routing

- Optimization using CMA-ES

General process: Transit routing

- Fit of the distributions
- Baseline: Minimize travel time (-1 u/h and -1 u/transfer


General process: Additional components

-
Parking pressure based on open data
- Registered vehicles in zone divided by accessible road network

General process: Cost structure

- Need to make hypotheses on the costs (for 2010)
- Car: 20 ct/km
- Parking: 3 EUR/h (Paris 2010, based on duration of following activity)
- Public transport: Per ticket or per duration
- Special case for Île-de-France / Paris
- For free if person has public transit subscription (person attribute)
- 1.80 EUR for trip within Paris or only us or metro
- Otherwise, regression model for regional tickets (Abdelkader DIB, IFPen)

Distances: OP = Origin > Paris; DP = Destination > Paris; D = Direct
Model structure





Model structure





Model structure





Model estimation

- Using Biogeme's Python interface
- 18 parameters
- R2 = 0.53

Simulation

- Model has been implemented in the simulation, so we have first results
- Currently, calibrating network parameters



Conclusion and outlook

- First prototype of the pipeline works for Île-de-France
- Currently packaging up the code and preparing a paper on the baseline case
-
Model improvements
- Improve simplified travel time estimation
- Integrate walking and bicycle routing (but few data available in 2010)
- Include weather information, generally complexify model formulation
-
Porting to other areas in France
- Path 1: Compare models for different areas, hopefully they are similar
- Path 2: Estimate a joint model for France with (ideally non-significant) regional dummies

✓
Questions?


Towards replicable mode choice models for transport simulations in France
By Sebastian Hörl
Towards replicable mode choice models for transport simulations in France
ISTDM 2023, Ispra, June 2023
- 789