Calibration & Optimisation
@reyman64
@ElCep
exModelo 2019
Etienne Delay
Sébastien Rey-Coyrehourcq
& Paul Chapron
Model at the first sight
Local sensitivity - OAT
"Face Validity"
"One-Variable-At-A-Time"
- unreliable
- time-consuming
- limited
- difficult to reproduce
"Tinkering" Papert & Retsnick
- playful
- interactive
- exploratory
- incremental
- trial error
A first step to systematize OAT?
=> Overall Sensitivity
a simple model,
a complete experimental design:
P1 x P2 x P3
500ko
3 graphiques
systematic outputs
How? How?
Duration & Volumetry impossible to manage
Issue no. 1 - Combinatorial explosion
How? How?
Issue no. 2 - "Curse Dimensionality"
dimensions
coverage at equal nb points
How? How?
A first solution, more sophisticated methods.
http://reseau-mexico.fr/
How? How?
A second solution, change your approach!
Organize a tension between
Input choices and Output choices
model
(mechanisms, parameters)
distance to data,
stylized fact/patterns
choosing Input
choosing Output
How? How?
??
generating
questionning
Quantified criteria to "guide"
automated exploration
What tensioning?
criteria
How? How?
Evolutionary Algorithm
Darwin like algorithms
Evolutionary algorithms (EAs) are population-based metaheuristicoptimization algorithms that use biology-inspired mechanisms like mutation, crossover, natural selection, and survival of the fittest in order to refinea set of solution candidates iteratively.
- Population based => perfect for HPC
- As good for Exploration than Exploitation
-
Optimize one or multiples objectives
Advantages :
Inconvenients :
- Greedy !
- Black Box
- Lot of parameters
- Termination criteria ?
Definitions
Objectives vs Fitness/Heuristic
is a way to inform the search about the direction to a goal. It provides an informed way to guess which path is promising. The heuristic function can be direct or indirect measurement, for example by providing only an approximation of the distance between a measurement and the optimum.
Heuristic function
Here’s an algorithm for driving to someone’s house :
Here’s an heuristic for getting to someone’s house :
Take Highway’s 167 south to Puyallup. Take the South Hill Mall exit and drive 4.5 miles up the hill. Turn into the drive away of the large tan house on theleft, at 714 North Cedar
Find the last letter we mailed you. Drive to the town in the return adress. When you get to town, ask someone where our house is. If you can’t find anyone, call us from a public phone, and we’ll come get you.
based on [Mac Connell 2004]
Definitions
Objectives vs Fitness/Heuristic
The objective function can be considered as a form of heuristics, except that it is a direct measure of the potential of an aspect of the solution. In this sense, the objective function often requires more expertise on the system than the heuristics.
A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims.
based on [Weise2011 p36]
Objective function
Fitness function
based on Wikipedia
Evolutionary Algorithm
Translation to simulation
Genome contain
values of parameters
mutate & crossover
values
simulation
Importance of both
Exploration & Exploitation
Some fitness landscape to understand (1obj.)
locked on local optima
easy landscape, one global optima
blocked on local optima
jump outside local optima
Evolutionary Algorithm
Take the best and try again ?
Objective space : Pareto introduction
Price
Gasoil
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
Each car dominates all others
by at least one objective
Vilfredo Pareto Economist 1950
Pareto front
d | c | |
Price | 3 | 2 |
Gasoil | 1 | 2 |
Non-Dominated Front
Take the best and try again ?
Objective space : Dominance based criteria
Price
Gasoil
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
An element x1 dominates (is preferred to) an element x2 (x1⊣ x2) if :
(a) x1 is better than x2 in at least one objective function
and (b) not worse with respect to all other objectives.
e dominate h ?
e dominate g ?
Definition :
e better on Price (False ) ? Gasoil (True) ?
e not worse on Price (False) ?
e better on Price (True) ? Gasoil (True) ?
e doesn't dominate h
Not so easy !
Maintaining a diversity of solution
Like many board games... winning n battles doesn't mean you win the whole war !
Battles lost
War lost
War lost
offsprings
Iteration 1
Iteration 2
(only & ) childs
Battles lost
Why ?
Imagine
Simulation with thousand parameter fixed for motors, you play with only one : power.
We continue to search to minize Price & Gasoil on cars.
OPrice
OGasoil
0
1
d
e
b
a
c
unknow laws from optimizer
pareto
How to distinguish between overall good solutions and locally fit solutions ?
optimizer doesn't know !
f
h
j
g
i
o
n
k
l
m
p
Relations between parameters & objectives space
Optimizer : How i select & move on the next step ?
=> Fitness help to solve this
Model
Objective Space (2 obj.)
Parameter Space (2 param.)
Not so easy !
Take the best and try again ?
Resuming the situation
EA Algorithms proceed by indirect move on the n dimensions of Parameters space to get better values on Objectives space => FITNESS
Hyp. of EA : A little move on parameter space could produce a better result on obj. space ... (an it works really well in many cases ...)
Ranking ?
Pareto ranking & diversity algorithm
f1 | f2 | dominated by | r1 | r2 | |
---|---|---|---|---|---|
a | 3,5 | 1 | 1 | 1 | |
b | 3 | 1,5 | 1 | 1 | |
c | 2 | 2 | 1 | 1 | |
d | 0,5 | 4 | 1 | 1 | |
e | 0,5 | 4,5 | 1 | 1 | |
f | 2 | 2 | |||
g | 3 | 5 | |||
h | 2 | 2 | |||
i | 3 | 4 | |||
j | 2 | 3 | |||
k | 2 | 4 | |||
l | 2 | 2 | |||
m | 3 | 6 | |||
n | 5 | 11 | |||
o | 4 | 8 |
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
ranking is used to compute
a fitness
One famous algorithm
NSGA 2
keeping the Pt candidatesin the 3 first Pareto front to maintain diversity
Pop (N) &
offsprings (Q)
selection < N
generate offsprings
merge
rejected
OpenMole & MGO library
NSGA 2
generational EA
Steady State EA
: as soon as possible strategy
OpenMole & MGO library
NSGA 2
Steady State Island EA
counter-balancing overhead on remote env.
OpenMole Syntax
NSGA 2
NSGA2Evolution(
evaluation = model,
genome = Seq(
myParameter1 in (0.0, 1.0),
myParameter2 in (-1.0,1.0)
),
objectives = Seq(myObjective1, myObjective2),
termination = 20000,
parallelism = 200,
distribution = Island(5 minutes),
stochastic = Stochastic(seed = seed)
) hook (workDirectory / "results/example") on env
- 2 objectives to minimize
- 2 freedom degree / parameters (genome)
- 200 islands (parallelism) of 5 minutes each
- 20000 evolutions performed by all the islands (200 island * 100 iteration )
- 1 hook to save population
ZombieLand
Trying to calibrate free parameters !
Application to Zombieland
val humanFollowProbability = Val[Double]
val humanInformedRatio = Val[Double]
val humanInformProbability = Val[Double]
val result = zombieInvasion(
humanFollowProbability = humanFollowProbability,
humanInformedRatio = humanInformedRatio,
humanInformProbability = humanInformProbability,
zombies = 4,
humans = 250,
steps = 500,
random = rng)
Data Data Data !
Application to Zombieland
0
20
40
60
80
500
480
460
steps
evacuated
observed evac.
2019
2030
2020
2029
dynamics.csv
Road to Objective : strategy to compare data (real/sim.) ?
Application to Zombieland
20 | 40 | ... | 500 | |
1 | ||||
2 | ||||
... | ||||
100 |
evac.
step
median by step
median by evac.
or
steps
evacuated
Road to Objective : simulated vs real data ?
Application to Zombieland
0
20
40
60
80
500
480
460
real
simulated
search in metrics
OpenMolisation : function to aggregate simulated data
Application to Zombieland
2 | 3 | 5 |
3 | 7 | 1 |
Vector[Array[Int]]
val v: Vector[Array[Int]] =
Vector(Array(2, 3, 5),
Array(2, 5, 6),
Array(3, 7, 1))
println(v.transpose)
// return Vector(Vector(2, 2, 3), Vector(3, 5, 7), Vector(5, 6, 1))
println(v.transpose.map(column => column.sum))
// return Vector(7, 15, 12)
println(v.transpose.map(column => column.median))
// return Vector(2, 5, 5)
2 | 5 | 6 |
2 | 3 | 5 |
2 | 5 | 6 |
3 | 7 | 1 |
2 | 5 | 5 |
median
def distance(dataSim: Vector[Array[Int]]) = {
realData = Array( ... ) // my median on real data
absoluteDistance ( dataSim, ... ) // obj
}
transpose exemple
function to define in wf
aggregate.oms
OpenMolisation : the last bricks
Application to Zombieland
NSGA2Evolution(
evaluation = model ,
genome = Seq(
humanInformedRatio in (0.0, 1.0),
humanInformProbability in (0.0, 1.0),
humanFollowProbability in (0.0, 1.0)
),
objectives = Seq(rescuedDynamic aggregate distance),
termination = 20000,
parallelism = 200,
distribution = Island(5 minutes),
stochastic = Stochastic(seed = seed)
) hook (workDirectory / "results_calibration/distance", frequency = 20) on env
EA definition with calibrate objective
Results
Application to Zombieland
Real dynamics
Simulated dynamics
Wait a sec ...
Application to Zombieland
- optimize perform with sum of distance criterium
- doesn't capture the overall dynamic to it's full extent
- Open question :
- introduce additional criterium ?
- change the aggregate function (MSE ?)
Public policies, try to anticipate by optimisation
Application to Zombieland
Training
Learning
faster
endurance
informed people
communication skill
Public policies, try to predict by optimisation
Application to Zombieland
val result = zombieInvasion(
humanFollowProbability = vhFP,
humanInformedRatio = vhIR + (formInform * vhIR),
humanInformProbability = vhIP - (formInform * vhIP),
humanExhaustionProbability = physic.humanExhaustionProbability - (runFast * physic.humanExhaustionProbability),
humanRunSpeed = physic.humanRunSpeed + (runFast * physic.humanRunSpeed),
zombies = 4,
humans = 250,
steps = 500,
random = rng)
hIR ?
hFP ?
hIP ?
calibration
optimization
new mechanism for public policy !
optimal values from
previous calibration
Public policies, try to anticipate by optimisation
Application to Zombieland
opti 1 : limit epidemy
peak time
peak size
opti 2 : save lifes
OpenMolisation : limit epidemy
Application to Zombieland
limit epidemy
peak time
peak size
NSGA2Evolution(
evaluation = model,
genome = Seq(
runFast in (-1.0,1.0),
formInform in (-1.0,1.0)
),
objectives = Seq(peakTime, peakSize),
termination = 20000,
parallelism = 200,
distribution = Island(5 minutes),
stochastic = Stochastic(seed = seed)
) hook (workDirectory / "results_opti/opti1") on env
??
OpenMolisation : limit epidemy
Application to Zombieland
Zoom
OpenMolisation : save lifes
Application to Zombieland
save lifes
NSGA2Evolution(
evaluation = model -- DeltaTask(totalZombified -> 0, totalRescued -> 250),
genome = Seq(
runFast in (-1.0,1.0),
formInform in (-1.0,1.0)
),
objectives = Seq(totalZombified, totalRescued),
termination = 20000,
parallelism = 200,
distribution = Island(5 minutes),
stochastic = Stochastic(seed = seed)
) hook (workDirectory / "results_opti/opti2") on env
??
OpenMolisation : save lifes
Application to Zombieland
Zoom
Interpretation
Application to Zombieland
- No pareto front :
- No need to choose a compromise
- The criteria are compatibles
- The criteria are either :
- totally independent
- in a "non constraining relationship"
- It still converge toward good solutions.
Going further
[Thiele2014] Thiele, Jan C., Kurth, Winfried and Grimm, Volker (2014) 'Facilitating Parameter Estimation and Sensitivity Analysis of Agent-Based Models: A Cookbook Using NetLogo and 'R'' Journal of Artificial Societies and Social Simulation 17 (3) 11 <http://jasss.soc.surrey.ac.uk/17/3/11.html>. doi: 10.18564/jasss.2503
[Rey2015] REY-COYREHOURCQ, Sébastien (2015, October 13) "Une plateforme intégrée pour la construction et l’évaluation de modèles de simulation en géographie" Thèse Paris 1. https://zenodo.org/record/50212
[Banos2016] Banos, Arnaud (2016) Modéliser c'est apprendre : Itinéraire d'un géographe, Edition Matériologique
[Cottineau2016] Cottineau, Clémentine, Rey-Coyrehourcq Sébastien (2016) "Back to the future of multi modelling" , Conférence RGS , slides
[Chérel2015] Chérel G., Cottineau C., Reuillon R., 2015, « Beyond Corroboration : Strengthening Model Validation by Looking for Unexpected Patterns. », PLoS ONE 10(9), e0138212. doi:10.1371/journal.pone.0138212
[Cottineau2015] Cottineau C., Reuillon R., Chapron P., Rey-Coyrehourcq S., Pumain D., 2015, "A Modular Modelling Framework for Hypotheses Testing in the Simulation of Urbanisation.", Systems, 3, Numéro Spécial "Agent-Based Modelling of City Systems", 348-377. DOI : 10.3390/systems3040348
[Cottineau2015] Cottineau C., Chapron P., Reuillon R., 2015, “Growing models from the bottom up. An evaluation-based incremental modelling method (EBIMM) applied to the simulation of systems of cities”, Journal of Artificial Societies and Social Simulation (JASSS), Vol. 18, No. 4, 9. DOI : 10.18564/jasss.2828.
[Banos2016] Banos, A., Lang, C., & Marilleau, N. (2016). Agent-based spatial simulation with NetLogo Volume 2: Advanced Concepts. Elsevier. url
[Reuillon2015] Reuillon, R., Schmitt, C., De Aldama, R., & Mouret, J.-B. (2015). A New Method to Evaluate Simulation Models: The Calibration Profile (CP) Algorithm. Journal of Artificial Societies and Social Simulation, 18(1), 12. Retrieved from http://jasss.soc.surrey.ac.uk/18/1/12.html
[Schmitt2015] Schmitt, C., Rey, S., Reuillon, R., & Pumain, D. (2015). Half a billion simulations: Evolutionary algorithms and distributed computing for calibrating the SimpopLocal geographical model. Environment and Planning B., 42(2), 300–315. url
[Reuillon2013] Reuillon, R., Leclaire, M., & Rey-Coyrehourcq, S. (2013). OpenMOLE, a workflow engine specifically tailored for the distributed exploration of simulation models. Future Generation Computer Systems, 29(8), 1981–1990. https://doi.org/http://dx.doi.org/10.1016/j.future.2013.05.003
++ Volker GRIMM !! ++
++ David O'Sullivan !! ++
[Delay2015] Delay E., « Réflexions géographiques sur l’usage des systèmes multi-agents dans la compréhension des processus d’évolution des territoires viticoles de fortes pentes : Le cas de la Côte Vermeille et du Val di Cembra », Thèse de doctorat, Université de Limoges, Limoges, 2015. HAL-SHA
Model Exploration
By sebastien rey coyrehourcq
Model Exploration
- 801