Inventory management of a scarce vaccine for epidemic control: Uncertainty quantification of time for deliveries and order sizes based on a model of sequential decisions

Towards Reinforcement learning

SEMINARIO SOBRE MÉTODOS MATEMÁTICOS Y ALGORITMOS.

Yofre H. Garcia

Saúl Diaz-Infante Velasco

Jesús Adolfo Minjárez Sosa

sauldiazinfante@gmail.com

October 02, 2024

When a vaccine is in short supply, sometimes refraining from vaccination is the best response—at least for a while.

On October 13, 2020, the Mexican government announced a vaccine delivery plan by Pfizer-BioNTech and other firms as part of the COVID-19 vaccination campaign.

Given a shipment of vaccines calendar, describe the stock management with backup protocol and quantify random fluctuations
due to schedule or quantity.

Then, incorporate this dynamic into an ODE system that describes the disease and evaluates its response accordingly.

Text

Nonlinear control: HJB and DP

Given

\frac{dx}{dt} = f(x(t))

Goal:

Desing

to follow

s. t. optimize cost

\text{action } a_t\in \mathcal{A},

\text{state } x_t

\frac{dx_{t}}{dt} =f(x_{t} ,a_{t})

Agent

a_t\in \mathcal{A}

x_t

J(x_t, a_t, 0, T) = Q(x_T, T) + \int_0^T \mathcal{L}(x_{\tau}, a_{\tau}) d_{\tau}

J(x_t, a_t, 0, T) .

\begin{aligned} %C_{t+1} &= C(x_{t}, a_{t}) \\ %\Phi_{t+1}^{h}(x_t,a_t) &= x_t +\varphi(h,\theta, a^{\theta}_t) \end{aligned}

Agent

R_{t+1}

x_{t+1}

x_0,a_0,R_1,

a_t\in \mathcal{A}(x_t)

action

state

x_{t}

reward

R_{t}

x_0, a_0, R_1, x_1, a_1, R_2, \cdots, x_t, a_t, R_{t+1} \cdots, x_{T-1}, a_{T-1}, R_T, x_T

G_t := R_{t+1} + \cdots + R_{T-1} + R_{T}

\begin{aligned} G_t &:= R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3}+ \cdots \\ &= \sum_{k=0}^{\infty} \gamma^{k} R_{t+1+k}, \qquad \gamma \in [0,1) \end{aligned}

G_t := R_{t+1} + \cdots + R_{T-1} + R_{T}

p(s^{\prime},r | s, a) := \mathbb{P}[x_t=s^{\prime}, R_{t}=r | x_{t-1}=s, a_t=a]

\begin{aligned} r(s, a) &:= \mathbb{E}[ R_t | x_{t-1}=s, a_{t-1}=a ] \\ &= \sum_{r\in \mathcal{R}} r \sum_{s^{\prime}\in S} p(s^{\prime}, r | s, a) \end{aligned}

\begin{aligned} G_t &:= R_{t+1} + \gamma R_{t+2} + \cdots + \gamma^{T-t-1} R_{T} \\ &=\sum_{k=t+1}^{T} \gamma^{k-t-1} R_k \end{aligned}

Discounted return

Total return

\begin{aligned} v_{\pi}(s) &:= \mathbb{E}_{\pi} [G_t | x_t = s] \\ &= \mathbb{E}_{\pi} \left[ \sum_{k=0}^{\infty} \gamma^{k} R_{t+k+1} \big| x_t =s \right] \end{aligned}

\pi(a|s):= \mathbb{P}[a_t = a|x_t=s]

\begin{aligned} v_{\pi}(s) &:= \mathbb{E}_{\pi} [G_t | x_t = s] \\ &= \mathbb{E}_{\pi} \left[ \sum_{k=0}^{\infty} \gamma^{k} R_{t+k+1} \big| x_t =s \right] \\ & = \mathbb{E}_{\pi} \left[ R_{t+1} + \gamma G_{t+1} | x_t = s \right] \\ &= \sum_{a} \pi(a |s) \sum_{s^{\prime}, r} p(s^{\prime},r | s, a) \left[ r + \gamma \mathbb{E}_{\pi}[G_{t+1} | x_{t+1}=s^{\prime}] \right] \\ &= \sum_{a} \pi(a |s) \sum_{s^{\prime}, r} p(s^{\prime},r | s, a) \left[ r + \gamma v_{\pi}(s^{\prime}) \right] \end{aligned}

\begin{aligned} v_{*}(s) &:= \max_{\pi} v_{\pi}(s) \\ &= \max_{a\in \mathcal{A}(s)} \mathbb{E}_{\pi_{*}} \left[ \sum_{k=0}^{\infty} \gamma^{k} R_{t+k+1} \big| x_t =s \right] \\ & = \max_{a\in \mathcal{A}(s)} \mathbb{E}_{\pi} \left[ R_{t+1} + \gamma G_{t+1} | x_t = s \right] \\ &= \max_{a\in \mathcal{A}(s)} \sum_{a} \pi(a |s) \sum_{s^{\prime}, r} p(s^{\prime},r | s, a) \left[ r + \gamma \mathbb{E}_{\pi}[G_{t+1} | x_{t+1}=s^{\prime}] \right] \\ &= \max_{a\in \mathcal{A}(s)} \sum_{a} \pi(a |s) \sum_{s^{\prime}, r} p(s^{\prime},r | s, a) \left[ r + \gamma v_{\pi}(s^{\prime}) \right] \end{aligned}

x_0, a_0, R_1, x_1, a_1, R_2, \cdots, x_t, a_t, R_{t+1} \cdots, x_{T-1}, a_{T-1}, R_T, x_T

\begin{aligned} \gamma &= 0 \\ G_t &:= R_{t+1} + \cancel{\gamma R_{t+2}} + \cdots + \cancel{\gamma^{T-t-1} R_{T}} \end{aligned}

Dopamine Reward

\begin{aligned} &\min_{a_{0}^{(k)} \in \mathcal{A}_0} c(x_{\cdot}, a_0):= c_1(a_0^{(k)})\cdot T^{(k)} + \sum_{n=0}^{N-1} c_0(t_n, x_{t_n}) \cdot h \\ \text{ s.t.} & \\ & x_{t_{n+1}} = x_{t_n} + F(x_{t_n}, \theta, a_0) \cdot h, \quad x_{t_0} = x(0), %\\ %\text{where: }& %\\ %t_n &:= %n \cdot h, \quad %n = 0, \cdots, N, %\quad t_{N} = T. \end{aligned}

\begin{aligned} C(x_{t^{(k+1)}}, & a_{t^{(k+1)}}) = \\ & C_{YLL}(x_{t^{(k+1)}},a_{t^{(k+1)}}) \\ +& C_{YLD}(x_{t^{(k+1)}},a_{t^{(k+1)}}) \\ +& C_{stock}(x_{t^{(k+1)}},a_{t^{(k+1)}}) \\ +& C_{campaign}(x_{t^{(k+1)}},a_{t^{(k+1)}}) \end{aligned}

\begin{aligned} x_{t_{n+1}}^{(k)} & = x_{t_n}^{(k)} + F(x_{t_n}^{(k)}, \theta^{(k)}, a_0^{(k)}) \cdot h^{(k)}, \quad x_{t_0^{(k)}} = x^{(k)}(0), \\ \text{where: }& \\ t_n^{(k)} &:= n \cdot h^{(k)}, \quad n = 0, \cdots, N^{(k)}, \quad t_{N}^{(k)} = T^{(k)}. \end{aligned}

J(x,\pi) = E\left[ \sum_{k=0}^M C(x_{t^{(k)}},a_{t^{(k)}}) | x_{t^{(0)}} = x , \pi \right]

\begin{aligned} C_{YLL}(x_{t^{(k+1)}},a_{t^{(k+1)}}) &= \int_{t^{(k)}}^{t^{(k+1)}} YLL dt, \\ C_{YLD}(x_{t^{(k+1)}},a_{t^{(k+1)}}) &= \int_{t^{(k)}}^{t^{(k+1)}} YLD(x_t, a_t) dt \\ YLL(x_t, a_t) &:= m_1 p \delta_E (E(t) - E^{t^{(k)}} ), \\ YLD(x_t, a_t) &:= m_2 \theta \alpha_S(E(t) - E^{t^{(k)}}), \\ t &\in [t^{(k)},t^{(k + 1)}] \end{aligned}

\begin{aligned} C_{stock}(x_{t^{(k+1)}},a_{t^{(k+1)}}) & = \int_{t^{(k)}}^{t^{(k+1)}} m_3(K_{Vac}(t) - K_{Vac}^{t^{(k)}}) dt \\ C_{campaign}(x_{t^{(k+1)}},a_{t^{(k+1)}}) &=\int_{t^{(k)}}^{t^{(k+1)}} m_4(X_{vac}(t) - X_{vac}^{t^{(k)}}) dt \end{aligned}

\frac{dx_{t}}{dt} =f(x_{t} ,a_{t})

Agent

a_t\in \mathcal{A}

C_t

x_t

\begin{aligned} a_t^{(k)} &= p_i \cdot \Psi_V^{(k)} \\ p_i &\in \mathcal{A}:=\{p_0, p_1, \dots, p_M\} \\ p_i &\in [0, 1] \end{aligned}

Deterministic Control

https://slides.com/sauldiazinfantevelasco/cinvestav-smma-oct-02-2024

Gracias!!

CINVESTAV-SeminarioSobreMetodosMatematicosAlgoritmos

By Saul Diaz Infante Velasco

CINVESTAV-SeminarioSobreMetodosMatematicosAlgoritmos

Explore innovative strategies for managing scarce vaccines in inventory! See how uncertainty in delivery times and order sizes impacts epidemic control efforts.

CINVESTAV-SeminarioSobreMetodosMatematicosAlgoritmos

More from Saul Diaz Infante Velasco