Causally Abstracted Multi-armed Bandits
Fabio Massimo Zennaro | Nicholas Bishop | Joel Dyer |
Yorgos Felekis | Anisoara Calinescu | Michael Wooldridge |
Theodoros Damoulas |
Introduction
- Causal abstraction is becoming an increasingly popular paradigm to relate causal models
- But how can causal abstraction be used to effectively transfer decision-making policies?
- To investigate this question we introduce Causally Abstracted Multi-armed Bandits (CAMABs)
- We study several generic approaches for transferring information between causal MABs via causal abstraction
Causal Abstraction
- Given two SCMs....
- An abstraction is a tuple
Causal Abstraction
Subset of relevant variables in
Surjective map between abstract and relevant variables
Collection of maps, one for each abstract variable
Domain of mapped variables
Domain of abstract variable
Causal Abstraction
Causal Abstraction
Causal Abstraction
Causal Abstraction
Causal Abstraction
Causal Abstraction
Interventional Consistency
Interventional Consistency
Interventional Consistency
Interventional Consistency
Interventional Consistency
Interventional Consistency
- If this diagram commutes for all interventions we say that the abstraction is interventionally consistent
Interventional Consistency
- Unfortunately it is unlikely that we can get all such diagrams to commute!
Interventional Consistency
- This motivates the definition of the interventional consistency (IC) error
IC Error
- The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting
IC Error
- The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting
Intervene then abstract
IC Error
- The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting
Intervene then abstract
Abstract then intervene
IC Error
- The IC error is the worst-case difference between abstracting then intervening versus intervening then abstracting
Intervene then abstract
Abstract then intervene
Wasserstein
distance
CMABs
- A CMAB is a multi-armed bandit problem where each arm corresponds to an intervention in an SCM
- We assume one variable in the SCM corresponds to the reward
Regret
- Our goal is to minimise the simple regret
Final randomised action
Average reward of best action
Average reward of chosen action
Regret
- or the cumulative regret
CAMABS
Can we transfer information across CMABS?
Transfer of Optimal Action
Transfer of Optimal Action
- This abstraction is exact and preserves the optimal intervention
Flipping Interventions
- This abstraction is exact but does not preserve the optimal action as interventions are flipped
Merging Rewards
- This abstraction is exact but does not preserve the optimal action as reward values are merged
Reward Discrepancy
- The reward discrepancy measures the worst-case difference in rewards before and after abstraction
Before abstraction
After abstraction
A Triangle Inequality
- Combining the IC error and reward discrepancy allows us to bound the difference in rewards before and after abstraction
- We can easily derive a sufficient condition guaranteeing that the optimal intervention is preserved
Transfer of Actions
- Let's say you have run a bandit algorithm in to produce a sequence of action and rewards
- When can you simply transfer these actions through the abstraction to obtain a good policy?
The Imitation Algorithm
Performance
- Let's say we run the UCB algorithm on .
- Then we run the UCB algorithm on and let the imitator copy it.
- What does better?
Performance
- Let's say we run the UCB algorithm on .
- Then we run the UCB algorithm on and let the imitator copy it.
- What does better?
- The imitator does better when
Intuition
You need to pull more arms in the base model than in the abstract model
Fixed cost for sampling more arms
Intuition
Scaling cost associated with suboptimality gaps of each arm
Representatives of abstract interventions may have very large suboptimality gaps
You need to pull more arms in the base model than in the abstract model
Fixed cost for sampling more arms
Intuition
Scaling cost associated with suboptimality gaps of each arm
Representatives of abstract interventions may have very large suboptimality gaps
You need to pull more arms in the base model than in the abstract model
Fixed cost for sampling more arms
Optimal arm preservation is required!
Transfer of Expectations
- Instead of transferring actions we may abstract the expected reward of actions directly
- Using these estimates we can initialise a bandit algorithm to run on !
The Transfer Algorithm
The Transfer Algorithm
The Transfer Algorithm
What arms can we eliminate?
Abstracting Expectations
- Of course, the operation may not even make sense if
- But we can always replace with an interpolating function
Abstract Expectations
- When are abstracted expectations good estimators?
- For the first equality, we need to be roughly linear.
- The second inequality can be managed via abstraction error.
Approximation Quality
- For linear we can bound the quality of abstract approximations with high probability
Approximation Quality
- For linear we can bound the quality of abstract approximations with high probability
Confidence in our approximation
of
Number of times was played
Approximation Quality
- We can therefore bound the quality of abstract expectations
The average error of the best linear interpolator for intervention
Eliminating arms
- This suggests action can be eliminated if
- Unfortunately, we may not have access to
Cases
- If is linear then the interpolation error disappears
- If we have access to action-reward samples we can also ignore the interpolation error
- We can always upper bound the interpolation error with Chebyshev regression
Summary
- This is preliminary work investigating the role of abstraction in decision-making
- IC error is not the only important criteria when using abstractions for decision-making
- CAMABs gives a causal formulation of transfer learning for bandit problems
UAI Slides copy (Fabio)
By nickbishop
UAI Slides copy (Fabio)
- 59