Calibration & Optimisation

@reyman64

@ElCep

exModelo 2019

Etienne Delay
Sébastien Rey-Coyrehourcq
& Paul Chapron

Model at the first sight

Local sensitivity - OAT

"Face Validity"

"One-Variable-At-A-Time"

- unreliable
- time-consuming
- limited
- difficult to reproduce

"Tinkering" Papert & Retsnick

- playful
- interactive
- exploratory
- incremental
- trial error

A first step to systematize OAT?

=> Overall Sensitivity

a simple model,

a complete experimental design:

P1 x P2 x P3

P_1

P_2

P_3

O_1

500ko

3 graphiques

systematic outputs

How? How?

{discreteStep}^{(\sum\limits_{i=1}^n P_i)} * replication * time

\{ replication \in \mathbb R \mid 30 \}

\{ duree \in \mathbb R \mid 1 minute \}

discreteStep = 11

11^3 * 30 * 1 \text{min} = 39930 min = 27 days

11^3 * 30 * 500\text{ko} = 19965000 Ko = 19 Go

P = 3

11^3 * 30 * 3 \text{~graph} = 119790~graphs

Duration & Volumetry impossible to manage

Issue no. 1 - Combinatorial explosion

How? How?

Issue no. 2 - "Curse Dimensionality"

dimensions

coverage at equal nb points

How? How?

A first solution, more sophisticated methods.

http://reseau-mexico.fr/

How? How?

A second solution, change your approach!

Organize a tension between the
Input choices and Output choices

model

(mechanisms, parameters)

distance to data,
stylized fact/patterns

choosing Input

choosing Output

How? How?

generating

questionning

Quantified criteria to "guide"
automated exploration

What tensioning?

criteria

How? How?

Evolutionary Algorithm

Darwin like algorithms

Evolutionary algorithms (EAs) are population-based metaheuristicoptimization algorithms that use biology-inspired mechanisms like mutation, crossover, natural selection, and survival of the fittest in order to refinea set of solution candidates iteratively.

Population based => perfect for HPC
As good for Exploration than Exploitation
Optimize one or multiples objectives

Advantages :

Inconvenients :

Greedy !
Black Box
Lot of parameters
Termination criteria ?

Definitions

Objectives vs Fitness/Heuristic

is a way to inform the search about the direction to a goal. It provides an informed way to guess which path is promising. The heuristic function can be direct or indirect measurement, for example by providing only an approximation of the distance between a measurement and the optimum.

Heuristic function

Here’s an algorithm for driving to someone’s house :

Here’s an heuristic for getting to someone’s house :

Take Highway’s 167 south to Puyallup. Take the South Hill Mall exit and drive 4.5 miles up the hill. Turn into the drive away of the large tan house on theleft, at 714 North Cedar

Find the last letter we mailed you. Drive to the town in the return adress. When you get to town, ask someone where our house is. If you can’t find anyone, call us from a public phone, and we’ll come get you.

based on [Mac Connell 2004]

Definitions

Objectives vs Fitness/Heuristic

The objective function can be considered as a form of heuristics, except that it is a direct measure of the potential of an aspect of the solution. In this sense, the objective function often requires more expertise on the system than the heuristics.

A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims.

based on [Weise2011 p36]

Objective function

Fitness function

based on Wikipedia

Evolutionary Algorithm

Translation to simulation

P = G_0 .. G_n

Genome contain

values of parameters

mutate & crossover
values

simulation

I_1 = G_1 + \{O_0 ... O_m \}

I_n = G_n + \{O_0 ... O_m \}

...

Importance of both
Exploration & Exploitation

Some fitness landscape to understand (1obj.)

locked on local optima

easy landscape, one global optima

blocked on local optima

jump outside local optima

Evolutionary Algorithm

Take the best and try again ?

Objective space : Pareto introduction

Price

Gasoil

Each car dominates all others

by at least one objective

Vilfredo Pareto Economist 1950

Pareto front

	d	c
Price	3	2
Gasoil	1	2

Non-Dominated Front

Take the best and try again ?

Objective space : Dominance based criteria

Price

Gasoil

An element x1 dominates (is preferred to) an element x2 (x1⊣ x2) if :

(a) x1 is better than x2 in at least one objective function

and (b) not worse with respect to all other objectives.

e dominate h ?

e dominate g ?

Definition :

e better on Price (False ) ? Gasoil (True) ?

e not worse on Price (False) ?

e better on Price (True) ? Gasoil (True) ?

e doesn't dominate h

Not so easy !

Maintaining a diversity of solution

Like many board games... winning n battles doesn't mean you win the whole war !

Battles lost

War lost

offsprings

Iteration 1

Iteration 2
(only & ) childs

Battles lost

Why ?

Imagine

Simulation with thousand parameter fixed for motors, you play with only one : power.

We continue to search to minize Price & Gasoil on cars.

OPrice

OGasoil

P_{power}

unknow laws from optimizer

pareto

How to distinguish between overall good solutions and locally fit solutions ?

optimizer doesn't know !

Relations between parameters & objectives space

P_{size}

P_{power}

O_{Price}

O_{Gasoil}

O_{price} = f_1(p_{power}, p_{size})

O_{gasoil} = f_2(p_{power}, p_{size})

Optimizer : How i select & move on the next step ?

=> Fitness help to solve this

Model

Objective Space (2 obj.)

Parameter Space (2 param.)

Not so easy !

Take the best and try again ?

Resuming the situation

EA Algorithms proceed by indirect move on the n dimensions of Parameters space to get better values on Objectives space => FITNESS

Hyp. of EA : A little move on parameter space could produce a better result on obj. space ... (an it works really well in many cases ...)

Ranking ?

Pareto ranking & diversity algorithm

	f1	f2	r1	r2
a	3,5	1	1	1
b	3	1,5	1	1
c	2	2	1	1
d	0,5	4	1	1
e	0,5	4,5	1	1
f			2	2
g			3	5
h			2	2
i			3	4
j			2	3
k			2	4
l			2	2
m			3	6
n			5	11
o			4	8

\emptyset

F_1

F_2

F_3

F_4

F_5

\emptyset

\{e\}

\{d,e,f,h\}

\{d\}

\{c,d,h\}

\{c,d\}

\{a,b,c\}

\{a\}

\{a,b,c,k,l\}

\{a,b,c,d,e,h,i,j,k,o\}

\{b,c,d,e,h,i,j\}

ranking is used to compute

a fitness

One famous algorithm

NSGA 2

P_t

Q_t

P_{t+1}

Q_{t+1}

F_1

F_2

F_3

F_4

F_5

R_t

R_{t+1}

keeping the Pt candidatesin the 3 first Pareto front to maintain diversity

Pop (N) &
offsprings (Q)

selection < N

generate offsprings

merge

rejected

OpenMole & MGO library

NSGA 2

generational EA

G = I_0... I_{10}

I_0

I_1

I_2

I_3

I_4... I_{10}

I_0

I_1

I_2

I_3

Steady State EA

: as soon as possible strategy

OpenMole & MGO library

NSGA 2

Steady State Island EA

Isl_n = Isl_0... Isl_{10}

Isl_0

Isl_1

Isl_2

Isl_3

Isl_4... Isl_{10}

Isl

P = \sum{Isl}

counter-balancing overhead on remote env.

OpenMole Syntax

NSGA 2

NSGA2Evolution(
  evaluation = model,
  genome = Seq(
   myParameter1 in (0.0, 1.0),
   myParameter2 in (-1.0,1.0)
  ),
  objectives = Seq(myObjective1, myObjective2),
  termination = 20000,
  parallelism = 200,
  distribution = Island(5 minutes),
  stochastic = Stochastic(seed = seed)
) hook (workDirectory / "results/example") on env

2 objectives to minimize
2 freedom degree / parameters (genome)
200 islands (parallelism) of 5 minutes each
20000 evolutions performed by all the islands (200 island * 100 iteration )
1 hook to save population

ZombieLand

Trying to calibrate free parameters !

Application to Zombieland

val humanFollowProbability = Val[Double]
val humanInformedRatio = Val[Double]
val humanInformProbability = Val[Double]

val result = zombieInvasion(
  humanFollowProbability = humanFollowProbability,
  humanInformedRatio = humanInformedRatio,
  humanInformProbability = humanInformProbability,
  zombies = 4,
  humans = 250,
  steps = 500,
  random = rng)

Data Data Data !

Application to Zombieland

500

480

460

steps

evacuated

observed evac.

2019

2030

2020

2029

dynamics.csv

Road to Objective : strategy to compare data (real/sim.) ?

Application to Zombieland

	20	40	...	500
1
2
...
100

evac.

step

median by step

median by evac.

steps

evacuated

Road to Objective : simulated vs real data ?

Application to Zombieland

500

480

460

real

simulated

d_{0}

d_{1}

d_{2}

d_{3}

d_{99}

d_{100}

O_{calibrage} = \sum{d_0 .. d_{100}}

search in metrics

OpenMolisation : function to aggregate simulated data

Application to Zombieland

Vector[Array[Int]]

val v: Vector[Array[Int]] =
  Vector(Array(2, 3, 5), 
         Array(2, 5, 6), 
         Array(3, 7, 1))

println(v.transpose)
// return Vector(Vector(2, 2, 3), Vector(3, 5, 7), Vector(5, 6, 1))

println(v.transpose.map(column => column.sum))
// return Vector(7, 15, 12)

println(v.transpose.map(column => column.median))
// return Vector(2, 5, 5)

2	3	5
2	5	6
3	7	1

median

def distance(dataSim: Vector[Array[Int]]) = { 
  realData =  Array( ... ) // my median on real data
  absoluteDistance ( dataSim, ... ) // obj
}

transpose exemple

function to define in wf

aggregate.oms

OpenMolisation : the last bricks

Application to Zombieland

NSGA2Evolution(
  evaluation = model ,
  genome = Seq(
    humanInformedRatio in (0.0, 1.0),
    humanInformProbability in (0.0, 1.0),
    humanFollowProbability in (0.0, 1.0)
  ),
  objectives = Seq(rescuedDynamic aggregate distance),
  termination = 20000,
  parallelism = 200,
  distribution = Island(5 minutes),
  stochastic = Stochastic(seed = seed)
) hook (workDirectory / "results_calibration/distance", frequency = 20) on env

EA definition with calibrate objective

Results

Application to Zombieland

Real dynamics

Simulated dynamics

Wait a sec ...

Application to Zombieland

optimize perform with sum of distance criterium

doesn't capture the overall dynamic to it's full extent

Open question :
- introduce additional criterium ?
- change the aggregate function (MSE ?)

Public policies, try to anticipate by optimisation

Application to Zombieland

Training

Learning

faster

endurance

informed people

communication skill

P_{formInform}

P_{runFast}

Public policies, try to predict by optimisation

Application to Zombieland

val result = zombieInvasion(
  humanFollowProbability = vhFP,
  humanInformedRatio = vhIR + (formInform * vhIR),
  humanInformProbability = vhIP - (formInform * vhIP),
  humanExhaustionProbability = physic.humanExhaustionProbability - (runFast * physic.humanExhaustionProbability),
  humanRunSpeed = physic.humanRunSpeed + (runFast * physic.humanRunSpeed),
  zombies = 4,
  humans = 250,
  steps = 500,
  random = rng)

hIR ?

hFP ?

hIP ?

calibration

v_{hIR}

v_{hFP}

v_{hIP}

optimization

v_{hFP} = 0.13234604715266718

v_{hIR} = 0.1442621096357345

v_{hIP} = 0.07780857466116062

new mechanism for public policy !

optimal values from

previous calibration

Public policies, try to anticipate by optimisation

Application to Zombieland

opti 1 : limit epidemy

peak time

peak size

opti 2 : save lifes

\Delta_R = \left\|250-TotalRescued \right\|

\Delta_Z = \left\|0 - TotalZombified \right\|

\Delta_R

\Delta_Z

OpenMolisation : limit epidemy

Application to Zombieland

limit epidemy

peak time

peak size

NSGA2Evolution(
  evaluation = model,
  genome = Seq(
    runFast in (-1.0,1.0),
    formInform in (-1.0,1.0)
  ),
  objectives = Seq(peakTime, peakSize),
  termination = 20000,
  parallelism = 200,
  distribution = Island(5 minutes),
  stochastic = Stochastic(seed = seed)
) hook (workDirectory / "results_opti/opti1") on env

OpenMolisation : limit epidemy

Application to Zombieland

Zoom

OpenMolisation : save lifes

Application to Zombieland

save lifes

\Delta_R = \left\|250-TotalRescued \right\|

\Delta_Z = \left\|0 - TotalZombified \right\|

\Delta_R

\Delta_Z

NSGA2Evolution(
  evaluation = model -- DeltaTask(totalZombified -> 0, totalRescued -> 250),
  genome = Seq(
    runFast in (-1.0,1.0),
    formInform in (-1.0,1.0)
  ),
  objectives = Seq(totalZombified, totalRescued),
  termination = 20000,
  parallelism = 200,
  distribution = Island(5 minutes),
  stochastic = Stochastic(seed = seed)
) hook (workDirectory / "results_opti/opti2") on env

OpenMolisation : save lifes

Application to Zombieland

Zoom

Interpretation

Application to Zombieland

No pareto front :
- No need to choose a compromise
- The criteria are compatibles
- The criteria are either :
  - totally independent
  - in a "non constraining relationship"
It still converge toward good solutions.

Going further

[Thiele2014] Thiele, Jan C., Kurth, Winfried and Grimm, Volker (2014) 'Facilitating Parameter Estimation and Sensitivity Analysis of Agent-Based Models: A Cookbook Using NetLogo and 'R'' Journal of Artificial Societies and Social Simulation 17 (3) 11 <http://jasss.soc.surrey.ac.uk/17/3/11.html>. doi: 10.18564/jasss.2503

[Rey2015] REY-COYREHOURCQ, Sébastien (2015, October 13) "Une plateforme intégrée pour la construction et l’évaluation de modèles de simulation en géographie" Thèse Paris 1. https://zenodo.org/record/50212

[Banos2016] Banos, Arnaud (2016) Modéliser c'est apprendre : Itinéraire d'un géographe, Edition Matériologique

[Cottineau2016] Cottineau, Clémentine, Rey-Coyrehourcq Sébastien (2016) "Back to the future of multi modelling" , Conférence RGS , slides

[Chérel2015] Chérel G., Cottineau C., Reuillon R., 2015, « Beyond Corroboration : Strengthening Model Validation by Looking for Unexpected Patterns. », PLoS ONE 10(9), e0138212. doi:10.1371/journal.pone.0138212

[Cottineau2015] Cottineau C., Reuillon R., Chapron P., Rey-Coyrehourcq S., Pumain D., 2015, "A Modular Modelling Framework for Hypotheses Testing in the Simulation of Urbanisation.", Systems, 3, Numéro Spécial "Agent-Based Modelling of City Systems", 348-377. DOI : 10.3390/systems3040348

[Cottineau2015] Cottineau C., Chapron P., Reuillon R., 2015, “Growing models from the bottom up. An evaluation-based incremental modelling method (EBIMM) applied to the simulation of systems of cities”, Journal of Artificial Societies and Social Simulation (JASSS), Vol. 18, No. 4, 9. DOI : 10.18564/jasss.2828.

[Banos2016] Banos, A., Lang, C., & Marilleau, N. (2016). Agent-based spatial simulation with NetLogo Volume 2: Advanced Concepts. Elsevier. url

[Reuillon2015] Reuillon, R., Schmitt, C., De Aldama, R., & Mouret, J.-B. (2015). A New Method to Evaluate Simulation Models: The Calibration Profile (CP) Algorithm. Journal of Artificial Societies and Social Simulation, 18(1), 12. Retrieved from http://jasss.soc.surrey.ac.uk/18/1/12.html

[Schmitt2015] Schmitt, C., Rey, S., Reuillon, R., & Pumain, D. (2015). Half a billion simulations: Evolutionary algorithms and distributed computing for calibrating the SimpopLocal geographical model. Environment and Planning B., 42(2), 300–315. url

[Reuillon2013] Reuillon, R., Leclaire, M., & Rey-Coyrehourcq, S. (2013). OpenMOLE, a workflow engine specifically tailored for the distributed exploration of simulation models. Future Generation Computer Systems, 29(8), 1981–1990. https://doi.org/http://dx.doi.org/10.1016/j.future.2013.05.003

++ Volker GRIMM !! ++

++ David O'Sullivan !! ++

[Delay2015] Delay E., « Réflexions géographiques sur l’usage des systèmes multi-agents dans la compréhension des processus d’évolution des territoires viticoles de fortes pentes : Le cas de la Côte Vermeille et du Val di Cembra », Thèse de doctorat, Université de Limoges, Limoges, 2015. HAL-SHA

	f1	f2	r1	r2
a	3,5	1	1	1
b	3	1,5	1	1
c	2	2	1	1
d	0,5	4	1	1
e	0,5	4,5	1	1
f			2	2
g			3	5
h			2	2
i			3	4
j			2	3
k			2	4
l			2	2
m			3	6
n			5	11
o			4	8

	f1	f2	r1	r2
a	3,5	1	1	1
b	3	1,5	1	1
c	2	2	1	1
d	0,5	4	1	1
e	0,5	4,5	1	1
f			2	2
g			3	5
h			2	2
i			3	4
j			2	3
k			2	4
l			2	2
m			3	6
n			5	11
o			4	8

	f1	f2	r1	r2
a	3,5	1	1	1
b	3	1,5	1	1
c	2	2	1	1
d	0,5	4	1	1
e	0,5	4,5	1	1
f			2	2
g			3	5
h			2	2
i			3	4
j			2	3
k			2	4
l			2	2
m			3	6
n			5	11
o			4	8