UAI Slides copy (Fabio)

Causally Abstracted Multi-armed Bandits

Fabio Massimo Zennaro	Nicholas Bishop	Joel Dyer
Yorgos Felekis	Anisoara Calinescu	Michael Wooldridge
	Theodoros Damoulas

Introduction

Causal abstraction is becoming an increasingly popular paradigm to relate causal models

But how can causal abstraction be used to effectively transfer decision-making policies?

To investigate this question we introduce Causally Abstracted Multi-armed Bandits (CAMABs)

We study several generic approaches for transferring information between causal MABs via causal abstraction

Causal Abstraction

Given two SCMs....

\mathcal{M} = \langle \mathcal{X}, \mathcal{U}, \mathcal{F}, \mathbb{P}(\mathcal{U}) \rangle

\mathcal{M^{\prime}} = \langle \mathcal{X}^{\prime}, \mathcal{U}^{\prime}, \mathcal{F}^{\prime}, \mathbb{P}(\mathcal{U}^{\prime}) \rangle

An abstraction is a tuple

\langle V, m, \alpha \rangle

Causal Abstraction

V \subset \mathcal{X}

Subset of relevant variables in

\mathcal{X}

m: V \to \mathcal{X}^{\prime}

Surjective map between abstract and relevant variables

\alpha = (\alpha_{X^{\prime}})_{X^{\prime} \in \mathcal{X}^{\prime}}

Collection of maps, one for each abstract variable

\alpha_{X^{\prime}}: \mathbb{D}[m^{-1}(X^{\prime})] \to \mathbb{D}[X^{\prime}]

Domain of mapped variables

Domain of abstract variable

Causal Abstraction

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

Causal Abstraction

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\mathcal{I} = \{ \text{do}(\texttt{T} = 0), \text{do}(\texttt{T} = 1), \text{do}(\texttt{T} = 2) \}

\mathcal{I}^{\prime} = \{ \text{do}(\texttt{T}^{\prime} = 0), \text{do}(\texttt{T}^{\prime} = 1)\}

Causal Abstraction

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\alpha

Causal Abstraction

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\alpha

V =\{T, Y\}

m(\texttt{T}) = \texttt{T}^{\prime}

m(\texttt{Y}) = \texttt{Y}^{\prime}

Causal Abstraction

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\alpha

V =\{T, Y\}

m(\texttt{T}) = \texttt{T}^{\prime}

m(\texttt{Y}) = \texttt{Y}^{\prime}

\alpha_{\texttt{T}^{\prime}}(0) = 0

\alpha_{\texttt{T}^{\prime}}(1) = 1

\alpha_{\texttt{T}^{\prime}}(2) = 1

\alpha_{\texttt{Y}^{\prime}} = \text{id}

Causal Abstraction

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\alpha

\text{do}(\texttt{T} = 0)

\text{do}(\texttt{T} = 1)

\text{do}(\texttt{T} = 2)

\text{do}(\texttt{T}^{\prime} = 0)

\text{do}(\texttt{T}^{\prime} = 1)

Interventional Consistency

x^{\prime}

\alpha_{X^{\prime}}

Interventional Consistency

x^{\prime}

\alpha_{X^{\prime}}

Y^{\prime}

\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

Interventional Consistency

x^{\prime}

\alpha_{X^{\prime}}

Y^{\prime}

\mathbb{P}(Y \mid \text{do}(x))

\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

Interventional Consistency

x^{\prime}

\alpha_{X^{\prime}}

Y^{\prime}

\alpha_{Y^{\prime}}

\mathbb{P}(Y \mid \text{do}(x))

\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

Interventional Consistency

x^{\prime}

\alpha_{X^{\prime}}

Y^{\prime}

\alpha_{Y^{\prime}}

\mathbb{P}(Y \mid \text{do}(x))

\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

If this diagram commutes for all interventions we say that the abstraction is interventionally consistent

Interventional Consistency

x^{\prime}

\alpha_{X^{\prime}}

Y^{\prime}

\alpha_{Y^{\prime}}

\mathbb{P}(Y \mid \text{do}(x))

\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

Y^{\prime\prime}

Unfortunately it is unlikely that we can get all such diagrams to commute!

Interventional Consistency

x^{\prime}

\alpha_{X^{\prime}}

Y^{\prime}

\alpha_{Y^{\prime}}

\mathbb{P}(Y \mid \text{do}(x))

\mathbb{P}(Y^{\prime} \mid \text{do}(x^{\prime}))

Y^{\prime\prime}

This motivates the definition of the interventional consistency (IC) error

IC Error

The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting

e(\alpha) = \max_{\text{do}(x) \in \mathcal{I}}D_{W_{2}}(\alpha(\mathbb{P}\{Y \mid \text{do}(x)\}), \mathbb{P}\{Y^{\prime} \mid \alpha(\text{do}(x))\})

IC Error

The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting

e(\alpha) = \max_{\text{do}(x) \in \mathcal{I}}D_{W_{2}}(\alpha(\mathbb{P}\{Y \mid \text{do}(x)\}), \mathbb{P}\{Y^{\prime} \mid \alpha(\text{do}(x))\})

Intervene then abstract

IC Error

The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting

e(\alpha) = \max_{\text{do}(x) \in \mathcal{I}}D_{W_{2}}(\alpha(\mathbb{P}\{Y \mid \text{do}(x)\}), \mathbb{P}\{Y^{\prime} \mid \alpha(\text{do}(x))\})

Intervene then abstract

Abstract then intervene

IC Error

The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting

e(\alpha) = \max_{\text{do}(x) \in \mathcal{I}}D_{W_{2}}(\alpha(\mathbb{P}\{Y \mid \text{do}(x)\}), \mathbb{P}\{Y^{\prime} \mid \alpha(\text{do}(x))\})

Intervene then abstract

Abstract then intervene

Wasserstein

distance

CMABs

A CMAB is a multi-armed bandit problem where each arm corresponds to an intervention in an SCM

We assume one variable in the SCM corresponds to the reward

\text{do}(T=0)

\text{do}(T=1)

\text{do}(T=2)

Y_{T=0}

Y_{T=1}

Y_{T=2}

Regret

Our goal is to minimise the simple regret

\bar{R}(T) = \mu^{\star} - \mathbb{E}_{\pi^{(T)}}[\mu_{a^{(T)}}]

Final randomised action

Average reward of best action

Average reward of chosen action

Regret

or the cumulative regret

R(T) = T\mu^{\star} - \sum_{t=1}^{T}\mathbb{E}_{\pi^{(t)}}[\mu_{a^{(t)}}]

CAMABS

\mathcal{M}

\mathcal{M}^{\prime}

\alpha

Can we transfer information across CMABS?

Transfer of Optimal Action

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}

\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}

\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}

Transfer of Optimal Action

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}

\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}

\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}

This abstraction is exact and preserves the optimal intervention

Flipping Interventions

\texttt{T}

\texttt{M}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\begin{bmatrix} .2 & .8 \\ .8 & .2 \end{bmatrix}

\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}

\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}

\begin{bmatrix} .7 & .3 \\ .3 & .7 \end{bmatrix}

This abstraction is exact but does not preserve the optimal action as interventions are flipped

Merging Rewards

\texttt{T}

\texttt{Y}

\texttt{T}^{\prime}

\texttt{Y}^{\prime}

\begin{bmatrix} .25 & .45 \\ .35 & .1 \\ .4 & .45 \end{bmatrix}

\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

\begin{bmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

\begin{bmatrix} .6 & .55 \\ .4 & .45 \end{bmatrix}

Y \in \{1, 1.1, 1.2\}

Y^{\prime} \in \{0, 1\}

This abstraction is exact but does not preserve the optimal action as reward values are merged

Reward Discrepancy

s(\alpha) = \max_{\text{do}(x) \in \mathcal{I}} D_{W_{2}} (\mathbb{P}\{Y \mid \text{do}(x)\}, \alpha(\mathbb{P}\{Y \mid \text{do}(x)\}))

The reward discrepancy measures the worst-case difference in rewards before and after abstraction

Before abstraction

After abstraction

A Triangle Inequality

Combining the IC error and reward discrepancy allows us to bound the difference in rewards before and after abstraction

\left|\mu_{a} - \mu_{\alpha(a)}\right| \leq e(\alpha) + s(\alpha)

We can easily derive a sufficient condition guaranteeing that the optimal intervention is preserved

e(\alpha) + s(\alpha) \leq \frac{1}{2}\min_{a \in \mathcal{I}} \Delta(a)

Transfer of Actions

Let's say you have run a bandit algorithm in to produce a sequence of action and rewards

\mathcal{D} = \{(a^{(t)}, g^{(t)})\}^{T}_{t=1}

When can you simply transfer these actions through the abstraction to obtain a good policy?

a^{(t)} \mapsto \alpha(a^{(t)})

\mathcal{M}

The Imitation Algorithm

a^{(t)}

\mathcal{M}

\mathcal{M}^{\prime}

\alpha(a^{(t)})

Performance

Let's say we run the UCB algorithm on .

Then we run the UCB algorithm on and let the imitator copy it.

What does better?

\mathcal{M}^{\prime}

\mathcal{M}

Performance

Let's say we run the UCB algorithm on .

Then we run the UCB algorithm on and let the imitator copy it.

What does better?

\mathcal{M}^{\prime}

\mathcal{M}

The imitator does better when

3\sum_{a^{\prime} \in \mathcal{I}^{\prime}} \Delta(a^{\prime})(1-|\alpha^{-1}(a^{\prime})|) + 16\log T \sum_{a^{\prime}\in \mathcal{I}^{\prime}} \left( \frac{\Delta(a^{\prime})}{\Delta(a^{\prime})^{2}} - \sum_{a \in \alpha^{-1}(a^{\prime})}\frac{\Delta(a^{\prime})}{\Delta(a)^{2}} \right) \geq 0

Intuition

You need to pull more arms in the base model than in the abstract model

Fixed cost for sampling more arms

Intuition

Scaling cost associated with suboptimality gaps of each arm

Representatives of abstract interventions may have very large suboptimality gaps

You need to pull more arms in the base model than in the abstract model

Fixed cost for sampling more arms

Intuition

Scaling cost associated with suboptimality gaps of each arm

Representatives of abstract interventions may have very large suboptimality gaps

You need to pull more arms in the base model than in the abstract model

Fixed cost for sampling more arms

Optimal arm preservation is required!

Transfer of Expectations

Instead of transferring actions we may abstract the expected reward of actions directly

\hat{\mu}_{a^{\prime}} = \alpha(\hat{\mu}_{a})

Using these estimates we can initialise a bandit algorithm to run on !

\mathcal{M}^{\prime}

The Transfer Algorithm

\mathcal{M}

\hat{\mu}_{a_{1}}, \dots, \hat{\mu}_{a_{K}}

\hat{\mu}_{a^{\prime}_{1}}, \dots, \hat{\mu}_{a^{\prime}_{K^{\prime}}}

\alpha

\mathcal{M}^{\prime}

The Transfer Algorithm

\mathcal{M}

\hat{\mu}_{a_{1}}, \dots, \hat{\mu}_{a_{K}}

\hat{\mu}_{a^{\prime}_{1}}, \dots, \hat{\mu}_{a^{\prime}_{K^{\prime}}}

\alpha

\mathcal{M}^{\prime}

The Transfer Algorithm

\mathcal{M}

\hat{\mu}_{a_{1}}, \dots, \hat{\mu}_{a_{K}}

\hat{\mu}_{a^{\prime}_{1}}, \dots, \hat{\mu}_{a^{\prime}_{K^{\prime}}}

\alpha

\mathcal{M}^{\prime}

What arms can we eliminate?

Abstracting Expectations

Of course, the operation may not even make sense if

But we can always replace with an interpolating function

\alpha(\hat{\mu}_{a})

\hat{\mu}_{a} \not\in \mathbb{D}[Y]

\alpha

w_{a}: \mathbb{R} \to \mathbb{R}

\alpha(Y)

Abstract Expectations

When are abstracted expectations good estimators?

\alpha(\mathbb{E}_{\text{do}(x)}[Y]) \approx \mathbb{E}_{\text{do}(x)}[\alpha(Y)]

For the first equality, we need to be roughly linear.

The second inequality can be managed via abstraction error.

\approx \mathbb{E}_{\text{do}(x^{\prime})}[Y^{\prime}]

\alpha

Approximation Quality

For linear we can bound the quality of abstract approximations with high probability

|\mu_{a^{\prime}} - \hat{\mu}_{a^{\prime}}| \leq \sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}} + |\mathbb{E}_{a}[\epsilon(Y)]| + e(\alpha)

w_{a}

Approximation Quality

For linear we can bound the quality of abstract approximations with high probability

|\mu_{a^{\prime}} - \hat{\mu}_{a^{\prime}}| \leq \sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}} + |\mathbb{E}_{a}[\epsilon(Y)]| + e(\alpha)

\sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}}

Confidence in our approximation

\hat{\mu}_{a}

\mu_{a}

\hat{\mu}_{a}

Number of times was played

\alpha_{\mathbb{E}}

Approximation Quality

We can therefore bound the quality of abstract expectations

|\mu_{a^{\prime}} - \hat{\mu}_{a^{\prime}}| \leq \sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}} + |\mathbb{E}_{a}[\epsilon(Y)]| + e(\alpha)

|\mathbb{E}_{a}[\epsilon(Y)]| = \min_{w_{a} \in \mathbb{R}}\mathbb{E}_{a}[|w_{a}\cdot Y - \alpha(Y)|]

The average error of the best linear interpolator for intervention

Eliminating arms

|\mu_{a^{\prime}} - \hat{\mu}_{a^{\prime}}| \leq \sqrt{\frac{2\log(2/\delta)}{\mathcal{C}(a)}} + |\mathbb{E}_{a}[\epsilon(Y)]| + e(\alpha)

This suggests action can be eliminated if

\kappa^{\prime}

a^{\prime}

\exists a^{\prime\prime} \in \mathcal{I}^{\prime}, \quad \hat{\mu}_{a^{\prime\prime}} - \kappa^{\prime\prime} \geq \hat{\mu}_{a^{\prime}} + \kappa^{\prime}

Unfortunately, we may not have access to

\kappa^{\prime}

Cases

If is linear then the interpolation error disappears

\alpha

|\mathbb{E}_{a}[\epsilon(Y)]| = 0

If we have access to action-reward samples we can also ignore the interpolation error

\hat{\mu}_{a^{\prime}} = \frac{1}{\mathcal{C}(a)}\sum^{T}_{t=1}\mathbf{1}(a^{(t)} = a)\alpha(g_{t})

We can always upper bound the interpolation error with Chebyshev regression

|\mathbb{E}_{a}[\epsilon(Y)]| \leq \min_{w\in\mathbb{R}}\max_{y \in \mathbb{D}[Y]}|y - \alpha(y)|

Summary

This is preliminary work investigating the role of abstraction in decision-making

IC error is not the only important criteria when using abstractions for decision-making

CAMABs gives a causal formulation of transfer learning for bandit problems