Causally Abstracted Multi-armed Bandits

Fabio Massimo Zennaro Nicholas Bishop Joel Dyer
Yorgos Felekis Anisoara Calinescu Michael Wooldridge
Theodoros Damoulas

Introduction

  • Causal abstraction is becoming an increasingly popular paradigm to relate causal models
  • But how can causal abstraction be used to effectively transfer decision-making policies?
  • To investigate this question we introduce Causally Abstracted Multi-armed Bandits (CAMABs)
  • We study several generic approaches for transferring information between causal MABs via causal abstraction

Causal Abstraction

  • Given two SCMs....
\mathcal{M} = \langle \mathcal{X}, \mathcal{U}, \mathcal{F}, \mathbb{P}(\mathcal{U}) \rangle
\mathcal{M^{\prime}} = \langle \mathcal{X}^{\prime}, \mathcal{U}^{\prime}, \mathcal{F}^{\prime}, \mathbb{P}(\mathcal{U}^{\prime}) \rangle
  • An abstraction is a tuple
\langle V, m, \alpha \rangle

Causal Abstraction

V \subset \mathcal{X}

Subset of relevant variables in 

\mathcal{X}
m: V \to \mathcal{X}^{\prime}

Surjective map between abstract and relevant variables 

\alpha = (\alpha_{X^{\prime}})_{X^{\prime} \in \mathcal{X}^{\prime}}

Collection of maps, one for each abstract variable

\alpha_{X^{\prime}}: \mathbb{D}[m^{-1}(X^{\prime})] \to \mathbb{D}[X^{\prime}]

Domain of mapped variables

Domain of abstract variable

Causal Abstraction

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}

Causal Abstraction

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\mathcal{I} = \{ \text{do}(\texttt{T} = 0), \text{do}(\texttt{T} = 1), \text{do}(\texttt{T} = 2) \}
\mathcal{I}^{\prime} = \{ \text{do}(\texttt{T}^{\prime} = 0), \text{do}(\texttt{T}^{\prime} = 1)\}

Causal Abstraction

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\alpha
\alpha

Causal Abstraction

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\alpha
\alpha
V =\{T, Y\}
m(\texttt{T}) = \texttt{T}^{\prime}
m(\texttt{Y}) = \texttt{Y}^{\prime}

Causal Abstraction

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\alpha
\alpha
V =\{T, Y\}
m(\texttt{T}) = \texttt{T}^{\prime}
m(\texttt{Y}) = \texttt{Y}^{\prime}
\alpha_{\texttt{T}^{\prime}}(0) = 0
\alpha_{\texttt{T}^{\prime}}(1) = 1
\alpha_{\texttt{T}^{\prime}}(2) = 1
\alpha_{\texttt{Y}^{\prime}} = \text{id}

Causal Abstraction

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\alpha
\alpha
\text{do}(\texttt{T} = 0)
\text{do}(\texttt{T} = 1)
\text{do}(\texttt{T} = 2)
\text{do}(\texttt{T}^{\prime} = 0)
\text{do}(\texttt{T}^{\prime} = 1)

Interventional Consistency

x

Interventional Consistency

x
x^{\prime}
\alpha_{X^{\prime}}

Interventional Consistency

x
x^{\prime}
\alpha_{X^{\prime}}
Y^{\prime}
\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

Interventional Consistency

x
x^{\prime}
\alpha_{X^{\prime}}
Y
Y^{\prime}
\mathbb{P}(Y \mid \text{do}(x))
\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

Interventional Consistency

x
x^{\prime}
\alpha_{X^{\prime}}
Y
Y^{\prime}
\alpha_{Y^{\prime}}
\mathbb{P}(Y \mid \text{do}(x))
\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

Interventional Consistency

x
x^{\prime}
\alpha_{X^{\prime}}
Y
Y^{\prime}
\alpha_{Y^{\prime}}
\mathbb{P}(Y \mid \text{do}(x))
\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))
  • If this diagram commutes for all interventions we say that the abstraction is interventionally consistent

Interventional Consistency

x
x^{\prime}
\alpha_{X^{\prime}}
Y
Y^{\prime}
\alpha_{Y^{\prime}}
\mathbb{P}(Y \mid \text{do}(x))
\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))
Y^{\prime\prime}
  • Unfortunately it is unlikely that we can get all such diagrams to commute!

Interventional Consistency

x
x^{\prime}
\alpha_{X^{\prime}}
Y
Y^{\prime}
\alpha_{Y^{\prime}}
\mathbb{P}(Y \mid \text{do}(x))
\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))
Y^{\prime\prime}
  • This motivates the definition of the interventional consistency (IC) error

IC Error

  • The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting
e(\alpha) = \max_{\text{do}(x) \in \mathcal{I}}D_{W_{2}}(\alpha(\mathbb{P}\{Y \mid \text{do}(x)\}), \mathbb{P}\{Y^{\prime} \mid \alpha(\text{do}(x))\})

IC Error

  • The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting
e(\alpha) = \max_{\text{do}(x) \in \mathcal{I}}D_{W_{2}}(\alpha(\mathbb{P}\{Y \mid \text{do}(x)\}), \mathbb{P}\{Y^{\prime} \mid \alpha(\text{do}(x))\})

Intervene then abstract

IC Error

  • The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting
e(\alpha) = \max_{\text{do}(x) \in \mathcal{I}}D_{W_{2}}(\alpha(\mathbb{P}\{Y \mid \text{do}(x)\}), \mathbb{P}\{Y^{\prime} \mid \alpha(\text{do}(x))\})

Intervene then abstract

Abstract then intervene

IC Error

  • The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting
e(\alpha) = \max_{\text{do}(x) \in \mathcal{I}}D_{W_{2}}(\alpha(\mathbb{P}\{Y \mid \text{do}(x)\}), \mathbb{P}\{Y^{\prime} \mid \alpha(\text{do}(x))\})

Intervene then abstract

Abstract then intervene

Wasserstein

distance

CMABs

  • A CMAB is a multi-armed bandit problem where each arm corresponds to an intervention in an SCM
  • We assume one variable in the SCM corresponds to the reward
\text{do}(T=0)
\text{do}(T=1)
\text{do}(T=2)
Y_{T=0}
Y_{T=1}
Y_{T=2}

Regret

  • Our goal is to minimise the simple regret
\bar{R}(T) = \mu^{\star} - \mathbb{E}_{\pi^{(T)}}[\mu_{a^{(T)}}]

Final randomised action

Average reward of best action

Average reward of chosen action

Regret

  • or the cumulative regret
R(T) = T\mu^{\star} - \sum_{t=1}^{T}\mathbb{E}_{\pi^{(t)}}[\mu_{a^{(t)}}]

CAMABS

\mathcal{M}
\mathcal{M}^{\prime}
\alpha

Can we transfer information across CMABS?

Transfer of Optimal Action

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}
\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}
\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}

Transfer of Optimal Action

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}
\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}
\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}
  • This abstraction is exact and preserves the optimal intervention

Flipping Interventions

\texttt{T}
\texttt{M}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}
\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}
\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}
\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}
\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}
\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}
  • This abstraction is exact but does not preserve the optimal action as interventions are flipped

Merging Rewards

\texttt{T}
\texttt{Y}
\texttt{T}^{\prime}
\texttt{Y}^{\prime}
\begin{bmatrix} .25 & .45 \\ .35 & .1 \\ .4 & .45 \end{bmatrix}
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
\begin{bmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}
\begin{bmatrix} .6 & .55 \\ .4 & .45 \end{bmatrix}
Y \in \{1, 1.1, 1.2\}
Y^{\prime} \in \{0, 1\}
  • This abstraction is exact but does not preserve the optimal action as reward values are merged

Reward Discrepancy

s(\alpha) = \max_{\text{do}(x) \in \mathcal{I}} D_{W_{2}} (\mathbb{P}\{Y \mid \text{do}(x)\}, \alpha(\mathbb{P}\{Y \mid \text{do}(x)\}))
  • The reward discrepancy measures the worst-case difference in rewards before and after abstraction

Before abstraction

After abstraction

A Triangle Inequality

  • Combining the IC error and reward discrepancy allows us to bound the difference in rewards before and after abstraction
\left|\mu_{a} - \mu_{\alpha(a)}\right| \leq e(\alpha) + s(\alpha)
  • We can easily derive a sufficient condition guaranteeing that the optimal intervention is preserved
e(\alpha) + s(\alpha) \leq \frac{1}{2}\min_{a \in \mathcal{I}} \Delta(a)

Transfer of Actions

  • Let's say you have run a bandit algorithm in         to produce a sequence of action and rewards
\mathcal{D} = \{(a^{(t)}, g^{(t)})\}^{T}_{t=1}
  • When can you simply transfer these actions through the abstraction to obtain a good policy?
a^{(t)} \mapsto \alpha(a^{(t)})
\mathcal{M}

The Imitation Algorithm

a^{(t)}
\mathcal{M}
\mathcal{M}^{\prime}
\alpha(a^{(t)})

Performance

  • Let's say we run the UCB algorithm on         .
  • Then we run the UCB algorithm on          and let the imitator copy it.
  • What does better?
\mathcal{M}^{\prime}
\mathcal{M}

Performance

  • Let's say we run the UCB algorithm on         .
  • Then we run the UCB algorithm on          and let the imitator copy it.
  • What does better?
\mathcal{M}^{\prime}
\mathcal{M}
  • The imitator does better when
3\sum_{a^{\prime} \in \mathcal{I}^{\prime}} \Delta(a^{\prime})(1-|\alpha^{-1}(a^{\prime})|) + 16\log T \sum_{a^{\prime}\in \mathcal{I}^{\prime}} \left( \frac{\Delta(a^{\prime})}{\Delta(a^{\prime})^{2}} - \sum_{a \in \alpha^{-1}(a^{\prime})}\frac{\Delta(a^{\prime})}{\Delta(a)^{2}} \right) \geq 0

Intuition

You need to pull more arms in the base model than in the abstract model

Fixed cost for sampling more arms

3\sum_{a^{\prime} \in \mathcal{I}^{\prime}} \Delta(a^{\prime})(1-|\alpha^{-1}(a^{\prime})|) + 16\log T \sum_{a^{\prime}\in \mathcal{I}^{\prime}} \left( \frac{\Delta(a^{\prime})}{\Delta(a^{\prime})^{2}} - \sum_{a \in \alpha^{-1}(a^{\prime})}\frac{\Delta(a^{\prime})}{\Delta(a)^{2}} \right) \geq 0

Intuition

Scaling cost associated with suboptimality gaps of each arm

Representatives of abstract interventions may have very large suboptimality gaps

You need to pull more arms in the base model than in the abstract model

Fixed cost for sampling more arms

3\sum_{a^{\prime} \in \mathcal{I}^{\prime}} \Delta(a^{\prime})(1-|\alpha^{-1}(a^{\prime})|) + 16\log T \sum_{a^{\prime}\in \mathcal{I}^{\prime}} \left( \frac{\Delta(a^{\prime})}{\Delta(a^{\prime})^{2}} - \sum_{a \in \alpha^{-1}(a^{\prime})}\frac{\Delta(a^{\prime})}{\Delta(a)^{2}} \right) \geq 0

Intuition

Scaling cost associated with suboptimality gaps of each arm

Representatives of abstract interventions may have very large suboptimality gaps

You need to pull more arms in the base model than in the abstract model

Fixed cost for sampling more arms

3\sum_{a^{\prime} \in \mathcal{I}^{\prime}} \Delta(a^{\prime})(1-|\alpha^{-1}(a^{\prime})|) + 16\log T \sum_{a^{\prime}\in \mathcal{I}^{\prime}} \left( \frac{\Delta(a^{\prime})}{\Delta(a^{\prime})^{2}} - \sum_{a \in \alpha^{-1}(a^{\prime})}\frac{\Delta(a^{\prime})}{\Delta(a)^{2}} \right) \geq 0

Optimal arm preservation is required!

Transfer of Expectations

  • Instead of transferring actions we may abstract the expected reward of actions directly
\hat{\mu}_{a^{\prime}} = \alpha(\hat{\mu}_{a})
  • Using these estimates we can initialise a bandit algorithm to run on        !
\mathcal{M}^{\prime}

The Transfer Algorithm

\mathcal{M}
\hat{\mu}_{a_{1}}, \dots, \hat{\mu}_{a_{K}}
\hat{\mu}_{a^{\prime}_{1}}, \dots, \hat{\mu}_{a^{\prime}_{K^{\prime}}}
\alpha
\mathcal{M}^{\prime}

The Transfer Algorithm

\mathcal{M}
\hat{\mu}_{a_{1}}, \dots, \hat{\mu}_{a_{K}}
\hat{\mu}_{a^{\prime}_{1}}, \dots, \hat{\mu}_{a^{\prime}_{K^{\prime}}}
\alpha
\mathcal{M}^{\prime}

The Transfer Algorithm

\mathcal{M}
\hat{\mu}_{a_{1}}, \dots, \hat{\mu}_{a_{K}}
\hat{\mu}_{a^{\prime}_{1}}, \dots, \hat{\mu}_{a^{\prime}_{K^{\prime}}}
\alpha
\mathcal{M}^{\prime}

What arms can we eliminate?

Abstracting Expectations

  • Of course, the operation               may not even make sense if
  • But we can always replace       with an interpolating function
\alpha(\hat{\mu}_{a})
\hat{\mu}_{a} \not\in \mathbb{D}[Y]
\alpha
w_{a}: \mathbb{R} \to \mathbb{R}
\alpha(Y)
Y

Abstract Expectations

  • When are abstracted expectations good estimators?
\alpha(\mathbb{E}_{\text{do}(x)}[Y]) \approx \mathbb{E}_{\text{do}(x)}[\alpha(Y)]
  • For the first equality, we need       to be roughly  linear.
  • The second inequality can be managed via abstraction error.
\approx \mathbb{E}_{\text{do}(x^{\prime})}[Y^{\prime}]
\alpha

Approximation Quality

  • For linear         we can bound the quality of abstract approximations with high probability
|\mu_{a^{\prime}} - \hat{\mu}_{a^{\prime}}| \leq \sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}} + |\mathbb{E}_{a}[\epsilon(Y)]| + e(\alpha)
w_{a}

Approximation Quality

  • For linear         we can bound the quality of abstract approximations with high probability
|\mu_{a^{\prime}} - \hat{\mu}_{a^{\prime}}| \leq \sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}} + |\mathbb{E}_{a}[\epsilon(Y)]| + e(\alpha)
\sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}}

Confidence in our approximation 

of 

\hat{\mu}_{a}
\mu_{a}
\hat{\mu}_{a}

Number of times      was played

a
\alpha_{\mathbb{E}}

Approximation Quality

  • We can therefore bound the quality of abstract expectations
|\mu_{a^{\prime}} - \hat{\mu}_{a^{\prime}}| \leq \sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}} + |\mathbb{E}_{a}[\epsilon(Y)]| + e(\alpha)
|\mathbb{E}_{a}[\epsilon(Y)]| = \min_{w_{a} \in \mathbb{R}}\mathbb{E}_{a}[|w_{a}\cdot Y - \alpha(Y)|]

The average error of the best linear interpolator for intervention 

a

Eliminating arms

|\mu_{a^{\prime}} - \hat{\mu}_{a^{\prime}}| \leq \sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}} + |\mathbb{E}_{a}[\epsilon(Y)]| + e(\alpha)
  • This suggests action       can be eliminated if
\kappa^{\prime}
a^{\prime}
\exists a^{\prime\prime} \in \mathcal{I}^{\prime}, \quad \hat{\mu}_{a^{\prime\prime}} - \kappa^{\prime\prime} \geq \hat{\mu}_{a^{\prime}} + \kappa^{\prime}
  • Unfortunately,  we may not have access to
\kappa^{\prime}

Cases

  • If        is linear then the interpolation error disappears
\alpha
|\mathbb{E}_{a}[\epsilon(Y)]| = 0
  • If  we have access to action-reward samples we can also ignore the interpolation error
\hat{\mu}_{a^{\prime}} = \frac{1}{\mathcal{C}(a)}\sum^{T}_{t=1}\mathbf{1}(a^{(t)} = a)\alpha(g_{t})
  • We can always upper bound the interpolation error with Chebyshev regression
|\mathbb{E}_{a}[\epsilon(Y)]| \leq \min_{w\in\mathbb{R}}\max_{y \in \mathbb{D}[Y]}|y - \alpha(y)|

Summary

  • This is preliminary work investigating the role of abstraction in decision-making
  • IC error is not the only important criteria when using abstractions for decision-making
  • CAMABs gives a causal formulation of transfer learning for bandit problems