An Invitation to Reinforcement Learning
To fix ideas: Formulation of a DTMC-SIS model
Nonlinear control: HJB and DP
Given
Goal:
Desing
to follow
s. t. optimize cost
Agent
Nonlinear control: HJB and DP
Bellman Optimality
Optimal Control Problem
s.t.
Data
Control
Spread Dynamics
Controller
Data
Control
Nonlinear control: Example
How to Manage TYLCV?
Replanting infected plants suffers random fluctuations due to is leading identification because a yellow plant is not necessarily infected.
Controls
Performing parameter calibration using MCMC (Markov Chain Monte Carlo).
Simulation: Counterfactual vs Controlled.
Spread Dynamics
Controller
SDEs, CTMC, Sto-Per
Data
Sto-Modelling
Control
Stochastic extension
Enhancing parameter calibration through noise
OPC:
Value Function
HJB
HJB (Dynamic Programming)
HJB(Neuro-Dynamic Programming)
Abstract dynamic programming.
Athena Scientific, Belmont, MA, 2013. viii+248 pp.
ISBN:978-1-886529-42-7
ISBN:1-886529-42-6
Rollout, policy iteration, and distributed reinforcement learning.
Revised and updated second printing
Athena Sci. Optim. Comput. Ser.
Athena Scientific, Belmont, MA, [2020], ©2020. xiii+483 pp.
ISBN:978-1-886529-07-6
Reinforcement learning and optimal control
Athena Sci. Optim. Comput. Ser.
Athena Scientific, Belmont, MA, 2019, xiv+373 pp.
ISBN: 978-1-886529-39-7
Agent
Agent
action
state
reward
Discounted return
Total return
Modeling a traffic light warning system for acute respiratory infections as an optimal control problem
Saul Diaz Infante Velasco
Adrian Acuña Zegarra
Jorge Velasco Hernandez
sauldiazinfante@gmail.com
Introduction
We propose an extension of the classic Kermack-McKendrick mathematical model. 𝑁(𝑡) is constant and is split into four compartments:
Susceptible individuals can become infected when interacting with an infectious individual. After a period of time (1∕𝛾), infected people recover. Recovered people lose their natural immunity after a period of time 1∕𝜃.
Risk index
Another closely related index that has been used to monitor and evaluate the development of ARI is the event gathering risk, developed by Chande (2020).
Chande, A., Lee, S., Harris, M. et al. Real-time, interactive website for US-county-level COVID-19 event risk assessment. at Hum Behav, 1313–1319 (2020). https://doi.org/10.1038/s41562-020-01000-9
# MODEL FORMULATION
Risk index
gives the probability of finding at time t an infected person in a group of k individuals.
# REPRODUCTIVE NUMBER
Normalization
Invariance
Basic Reproductive number
FDE
Effective reproduction number
# Light Traffic Policies and Optimal Control
Hypothesis
(OCP) Decide in each stage (a week), the light color that minimize functional cost J
subject to:
Counterfactual vs controlled dynamics
Counterfactual vs controlled dynamics
Counterfactual vs controlled dynamics
The influence of mobility restriction expenses over prevalence and cost
Expenses
due to mobility restrictions
The influence of decision period span over prevalence and cost
Perspectives
References
Gracias
https://slides.com/sauldiazinfantevelasco/code-33ee77/fullscreen