Accounting for Real World Phenomena in Machine Learning and Mechanism Design
Nicholas Bishop
Insurance
$500
$470
Insurance
$450
$470
Insurance
Inspection doesn't scale!
Resource Deployment
Resource Deployment
Resource Deployment
Resource Deployment
Resource Deployment
We must account for resource availability!
Goals
Consider agent incentives when making decisions
Account for real world constraints on decision
making
Contributions
Stackelberg Prediction Games for Linear Regression (Chapter 3)
Adversarial Blocking Bandits (Chapter 4)
Sequential Blocked Matching (Chapter 5)
Strategic Linear Regression
At training time:
At test time:
Strategic Linear Regression
At training time:
At test time:
Target of the data provider
Choosing a Predictor
Idea: Simulate agent behaviour using training data!
Learner's loss
Agent's loss
Agent's manipulation cost
Key Questions
Can we solve such an optimisation problem?
Does the solution generalise well?
Key Questions
Can we solve such an optimisation problem?
Does the solution generalise well?
Optimisation
Reformulate the problem as a fractional program
Substitute outÂ
Optimisation
Use fractional programming to rewrite this problem as single parameter root finding problem.
Idea: Use bisection search to find a root!
Optimisation
Problem: How do we evaluate       Â
Solution: Convert to an SDP!       Â
Generalisation
Consider what a linear function predicts after manipulation:
Each of these functions are linear and have bounded norm!
Hence we can bound the Rademacher complexity of the resulting hypothesis class!
Blocking Bandits
Blocking Bandits
Blocking Bandits
Blocking Bandits
Blocking Bandits
Blocking Bandits
Blocking Bandits
Rewards and Delays in the Real World
Rewards and Delays in the Real World
Rewards and Delays in the Real World
Rewards and Delays in the Real World
Adversarial Blocking Bandits
Rewards vary adversarially in accordance with a path variation budget
Blocking durations are free to vary arbitrarily, but are bounded above.
Full Information Setting
Consider a greedy algorithm which pulls the arm with highest reward
Using a knapsack-style proof, we obtain the following regret guarantee
Bandit Setting
Split the time horizon into blocks
Bandit Setting
Split the time horizon into blocks
Bandit Setting
Split the time horizon into blocks
Consider one such block
At the start of the block, play each arm once, and store the rewards observed. Then pull no arms until all arms are available.
Bandit Setting
Split the time horizon into blocks
Consider one such block
Then play greedily, using the rewards received in the first phase as a proxy for the real reward
Bandit Setting
Split the time horizon into blocks
Consider one such block
Pull no arms at the end of the block so all arms will be available at the beginning of the next block.
Bandit Setting
Split the time horizon into blocks
Consider one such block
Pull no arms at the end of the block so all arms will be available at the beginning of the next block.
Bandit Setting
By appropriately choosing block length, we can obtain the following regret bound:
Problem: We need to know the variation budget to set the block length!
Solution: Run EXP3 as a meta-bandit algorithm to learn the correct block length!
Bandit Setting
Maintain a list of possible budgets and split the time horizon into blocks
Bandit Setting
Maintain a list of possible budgets and split the time horizon into blocks
Bandit Setting
Maintain a list of possible budgets and split the time horizon into blocks
Consider one such block
Bandit Setting
Maintain a list of possible budgets and split the time horizon into epochs
Consider one such epoch
Sample a budget and thus an associated block length and play the previous algorithm within the epoch
Bandit Setting
Maintain a list of possible budgets and split the time horizon into epochs
Consider one such epoch
Sample a budget and thus an associated block length and play the previous algorithm within the epoch
Bandit Setting
Maintain a list of possible budgets and split the time horizon into epochs
Consider one such epoch
At the end of the epoch, update the sampling probability of the chosen budget according to EXP3.
Bandit Setting
Maintain a list of possible budgets and split the time horizon into epochs
Consider one such epoch
Repeat this process with the next epoch.
Sequential Blocked Matching
2
2
1
1
2
1
1
1
1
2
1
1
Sequential Blocked Matching
2
1
1
1
1
2
1
1
2
2
1
1
Sequential Blocked Matching
2
1
1
1
1
2
1
1
2
2
1
1
Sequential Blocked Matching
2
1
1
1
1
2
1
1
2
2
1
1
Sequential Blocked Matching
2
1
1
1
1
2
1
1
2
2
1
1
Sequential Blocked Matching
2
1
1
1
1
2
1
1
2
2
1
1
Sequential Blocked Matching
2
1
1
1
1
2
1
1
2
2
1
1
Sequential Blocked Matching
2
1
1
1
1
2
1
1
2
2
1
1
Requirements
Resistance to strategic manipulation induced by blocking - bound the incentive ratio.
Achieve high social welfare - minimise the distortion.
Repeated RSD
Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.
Repeated RSD
Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.
Repeated RSD
Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.
Repeated RSD
Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.
Repeated RSD
Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.
Repeated RSD
Repeated RSD is asymptotically optimal in terms of distortion.
Repeated RSD can be derandomized to yield a deterministic algorithm which is also asymptotically optimal!Â
Repeated RSD also has bounded incentive ratio!
Bandit Matching
Bandit Matching
Bandit Matching
Bandit Matching
(mean-based)
Bandit RRSD
Idea: Extend RRSD to bandit setting with explore-then-commit framework!
Bandit RRSD
Idea: Extend RRSD to bandit setting with explore-then-commit framework!
In the exploration phase, assign each agent each service a fixed number of times
Bandit RRSD
Idea: Extend RRSD to bandit setting with explore-then-commit framework!
In the exploration phase, assign each agent each service a fixed number of times
Wait until all arms are available
Bandit RRSD
Idea: Extend RRSD to bandit setting with explore-then-commit framework!
In the exploration phase, assign each agent each service a fixed number of times
Wait until all arms are available
In the exploitation phase, play RRSD, using the last preference submission of each agent
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
1
5
7
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
1
5
7
0
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
1
5
7
0
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
1
5
7
0
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
1
5
7
0
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
1
5
7
0
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
1
5
7
0
Bandit RRSD
2
2
1
1
1
2
1
1
2
1
1
1
8
1
9
7
10
3
9
4
1
5
7
0
Copy of Viva Voce
By nickbishop
Copy of Viva Voce
- 65