Accounting for Real World Phenomena in Machine Learning and Mechanism Design

Nicholas Bishop

Decision-Making in Theory

(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y})

Real World Complications

(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y})

Insurance

$500

$470

Insurance

$450

$470

Insurance

Inspection doesn't scale!

Real World Complications

(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y})

Resource Deployment

Resource Deployment

Resource Deployment

Resource Deployment

Resource Deployment

We must account for resource availability!

Goals

Consider agent incentives when making decisions

Account for real world constraints on decision

making

Contributions

Stackelberg Prediction Games for Linear Regression (Chapter 3)

Adversarial Blocking Bandits (Chapter 4)

Sequential Blocked Matching (Chapter 5)

Strategic Linear Regression

\{(\textcolor{blue}{\mathbf{x}_{i}}, \textcolor{red}{y_{i}}, \textcolor{purple}{z_{i}})\}^{m}_{i=1}
(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y}, \textcolor{purple}{z_{i}})

At training time:

At test time:

(\textcolor{blue}{\tilde{\mathbf{x}}}, \textcolor{red}{y}, \textcolor{purple}{z_{i}})

Strategic Linear Regression

\{(\textcolor{blue}{\mathbf{x}_{i}}, \textcolor{red}{y_{i}}, \textcolor{purple}{z_{i}})\}^{m}_{i=1}
(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y}, \textcolor{purple}{z})

At training time:

At test time:

(\textcolor{blue}{\tilde{\mathbf{x}}}, \textcolor{red}{y}, \textcolor{purple}{z})

Target of the data provider

Choosing a Predictor

Idea: Simulate agent behaviour using training data!

\min_{\mathbf{w}}\sum^{m}_{i=1}(\mathbf{w}^{\top}\textcolor{blue}{\mathbf{x}^{\star}} - \textcolor{red}{y_{i}})^{2}
\text{s.t. } \textcolor{blue}{\mathbf{x}_{i}^{\star}} = \text{arg}\min_{\textcolor{blue}{\tilde{\mathbf{x}}_{i}}} (\mathbf{w}^{\top}\textcolor{blue}{\tilde{\mathbf{x}}} - \textcolor{purple}{z_{i}})^{2} + \gamma\|\textcolor{blue}{\tilde{\mathbf{x}}_{i}} - \textcolor{blue}{\mathbf{x}_{i}}\|^{2}_{2} \quad \forall i

Learner's loss

Agent's loss

Agent's manipulation cost

Key Questions

Can we solve such an optimisation problem?

Does the solution generalise well?

Key Questions

Can we solve such an optimisation problem?

Does the solution generalise well?

Optimisation

Reformulate the problem as a fractional program

\text{arg}\min_{\mathbf{w}}\left\|\frac{\frac{1}{\gamma}\textcolor{purple}{\mathbf{z}}\|\mathbf{w}\|^{2} + \textcolor{blue}{X}\mathbf{w}}{ (1 + \frac{1}{\gamma}\|\mathbf{w}\|^{2})^{2}} - \textcolor{red}{\mathbf{y}}\right\|^{2}

Substitute out 

\|\mathbf{w}\|^{2}
\arg\min_{\mathbf{w}, \alpha}\left\|\frac{\frac{\alpha}{\gamma}\textcolor{purple}{\mathbf{z}} + \textcolor{blue}{X}\mathbf{w}}{ (1 + \frac{\alpha}{\gamma})^{2}} - \textcolor{red}{\mathbf{y}}\right\|^{2}
\text{s.t. } \alpha = \|\mathbf{w}\|^{2}

Optimisation

Use fractional programming to rewrite this problem as single parameter root finding problem.

\arg\min_{\mathbf{w}, \alpha}\left\|\frac{\frac{\alpha}{\gamma}\textcolor{purple}{\mathbf{z}} + \textcolor{blue}{X}\mathbf{w}}{ (1 + \frac{\alpha}{\gamma})^{2}} - \textcolor{red}{\mathbf{y}}\right\|^{2}
\text{s.t. } \alpha = \|\mathbf{w}\|^{2}
F(q) = \text{arg}\min_{\mathbf{w}, \alpha}\|\frac{\alpha}{\gamma}\textcolor{purple}{\mathbf{z}} + \textcolor{blue}{X}\mathbf{w} - \textcolor{red}{\mathbf{y}} - \frac{\alpha}{\gamma}\textcolor{red}{\mathbf{y}}\|^{2} - q(1 + \frac{\alpha}{\gamma})^{2}
\text{s.t. } \alpha = \|\mathbf{w}\|^{2}
= 0

Idea: Use bisection search to find a root!

Optimisation

Problem: How do we evaluate              

F(q)\text{?}
\text{s.t. }\begin{bmatrix} A + \lambda B & \mathbf{a} + \lambda \mathbf{b} \\ \mathbf{a}^{\top} + \lambda \mathbf{b}^{\top} & c - \tau \end{bmatrix}
\max_{\tau, \lambda} \:\tau

Solution: Convert to an SDP!              

Generalisation

h_{\mathbf{w}}(\textcolor{blue}{\mathbf{x}}, \textcolor{purple}{z}) = \frac{\mathbf{w}^{\top}\textcolor{blue}{\mathbf{x}} + \frac{1}{\gamma}\|\mathbf{w}\|^{2}\textcolor{purple}{z}}{1 + \frac{1}{\gamma}\|\mathbf{w}\|^{2}}

Consider what a linear function predicts after manipulation:

Each of these functions are linear and have bounded norm!

Hence we can bound the Rademacher complexity of the resulting hypothesis class!

Blocking Bandits

\textcolor{blue}{\mu_{1}}
\textcolor{blue}{\mu_{2}}
\textcolor{blue}{\mu_{3}}
\textcolor{red}{D_{1}}
\textcolor{red}{D_{2}}
\textcolor{red}{D_{3}}

Blocking Bandits

\textcolor{blue}{1}
\textcolor{blue}{5}
\textcolor{blue}{8}
\textcolor{red}{1}
\textcolor{red}{2}
\textcolor{red}{2}

Blocking Bandits

\textcolor{blue}{1}
\textcolor{blue}{5}
\textcolor{blue}{8}
\textcolor{red}{1}
\textcolor{red}{2}
\textcolor{red}{2}
\textcolor{green}{1.5}

Blocking Bandits

\textcolor{blue}{1}
\textcolor{blue}{5}
\textcolor{blue}{8}
\textcolor{red}{1}
\textcolor{red}{2}
\textcolor{red}{2}

Blocking Bandits

\textcolor{blue}{1}
\textcolor{blue}{5}
\textcolor{blue}{8}
\textcolor{red}{1}
\textcolor{red}{2}
\textcolor{red}{2}
\textcolor{green}{0.9}

Blocking Bandits

\textcolor{blue}{1}
\textcolor{blue}{5}
\textcolor{blue}{8}
\textcolor{red}{1}
\textcolor{red}{2}
\textcolor{red}{2}

Blocking Bandits

\textcolor{blue}{1}
\textcolor{blue}{5}
\textcolor{blue}{8}
\textcolor{red}{1}
\textcolor{red}{2}
\textcolor{red}{2}

Rewards and Delays in the Real World

\textcolor{blue}{1}
\textcolor{blue}{5}
\textcolor{blue}{8}
\textcolor{red}{1}
\textcolor{red}{2}
\textcolor{red}{2}

Rewards and Delays in the Real World

\textcolor{blue}{10}
\textcolor{blue}{11}
\textcolor{blue}{2}
\textcolor{red}{3}
\textcolor{red}{5}
\textcolor{red}{3}

Rewards and Delays in the Real World

\textcolor{blue}{10}
\textcolor{blue}{11}
\textcolor{blue}{2}
\textcolor{red}{3}
\textcolor{red}{5}
\textcolor{red}{3}

Rewards and Delays in the Real World

\textcolor{blue}{2}
\textcolor{blue}{20}
\textcolor{blue}{10}
\textcolor{red}{6}
\textcolor{red}{7}
\textcolor{red}{6}

Adversarial Blocking Bandits

Rewards vary adversarially in accordance with a path variation budget

Blocking durations are free to vary arbitrarily, but are bounded above.

\sum^{T-1}_{t=1}\sum^{K}_{k=1}|X^{k}_{t+1} - X^{k}_{t}| \leq B_{T}
D^{k}_{t} \leq \tilde{D} \quad \forall k , t

Full Information Setting

Consider a greedy algorithm which pulls the arm with highest reward

Using a knapsack-style proof, we obtain the following regret guarantee

\text{arg}\max_{k \in A_{t}}X^{k}_{t}
\left(1 + \tilde{D}\right)^{-1}\left(1 - \frac{\tilde{D}B_{T}}{r(\pi^{\star})}\right)

Bandit Setting

0
T

Split the time horizon into blocks

Bandit Setting

0
T
\Delta_{T}

Split the time horizon into blocks

Bandit Setting

0
T

Split the time horizon into blocks

Consider one such block

At the start of the block, play each arm once, and store the rewards observed. Then pull no arms until all arms are available.

Bandit Setting

0
T

Split the time horizon into blocks

Consider one such block

Then play greedily, using the rewards received in the first phase as a proxy for the real reward

Bandit Setting

0
T

Split the time horizon into blocks

Consider one such block

Pull no arms at the end of the block so all arms will be available at the beginning of the next block.

Bandit Setting

0
T

Split the time horizon into blocks

Consider one such block

Pull no arms at the end of the block so all arms will be available at the beginning of the next block.

Bandit Setting

By appropriately choosing block length, we can obtain the following regret bound:

\mathcal{O}\left(\sqrt{T(2\tilde{D} + K)B_{T}}\right)

Problem: We need to know the variation budget to set the block length!

Solution: Run EXP3 as a meta-bandit algorithm to learn the correct block length!

Bandit Setting

Maintain a list of possible budgets and split the time horizon into blocks

0
T

Bandit Setting

Maintain a list of possible budgets and split the time horizon into blocks

0
T
H

Bandit Setting

Maintain a list of possible budgets and split the time horizon into blocks

0
T

Consider one such block

Bandit Setting

Maintain a list of possible budgets and split the time horizon into epochs

0
T

Consider one such epoch

Sample a budget and thus an associated block length and play the previous algorithm within the epoch

Bandit Setting

Maintain a list of possible budgets and split the time horizon into epochs

0
T

Consider one such epoch

Sample a budget and thus an associated block length and play the previous algorithm within the epoch

Bandit Setting

Maintain a list of possible budgets and split the time horizon into epochs

0
T

Consider one such epoch

At the end of the epoch, update the sampling probability of the chosen budget according to EXP3.

Bandit Setting

Maintain a list of possible budgets and split the time horizon into epochs

0
T

Consider one such epoch

Repeat this process with the next epoch.

Sequential Blocked Matching

2

2

1

1

2

1

1

1

1

2

1

1

Sequential Blocked Matching

2

1

1

1

1

2

1

1

2

2

1

1

Sequential Blocked Matching

2

1

1

1

1

2

1

1

2

2

1

1

Sequential Blocked Matching

2

1

1

1

1

2

1

1

2

2

1

1

Sequential Blocked Matching

2

1

1

1

1

2

1

1

2

2

1

1

Sequential Blocked Matching

2

1

1

1

1

2

1

1

2

2

1

1

Sequential Blocked Matching

2

1

1

1

1

2

1

1

2

2

1

1

Sequential Blocked Matching

2

1

1

1

1

2

1

1

2

2

1

1

Requirements

Resistance to strategic manipulation induced by blocking - bound the incentive ratio.

Achieve high social welfare - minimise the distortion.

Repeated RSD

Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.

0
T

Repeated RSD

Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.

0
T

Repeated RSD

Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.

0
T

Repeated RSD

Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.

0
T

Repeated RSD

Generalise RSD, by allowing each agent to choose its allocation for the entire time horizon greedily.

0
T

Repeated RSD

Repeated RSD is asymptotically optimal in terms of distortion.

Repeated RSD can be derandomized to yield a deterministic algorithm which is also asymptotically optimal! 

Repeated RSD also has bounded incentive ratio!

Bandit Matching

Bandit Matching

Bandit Matching

i
j
m_{ij} \sim \mu_{ij}

Bandit Matching

i
j
m_{ij} \sim \mu_{ij}
\hat{\mu}_{i} = \frac{\sum^{T}_{t=1}\mathbf{1}[m_{i}(t) = j]m_{i}(t)}{\sum^{T}_{t=1}\mathbf{1}[m_{i}(t) = j]}

(mean-based)

Bandit RRSD

Idea: Extend RRSD to bandit setting with explore-then-commit framework!

0
T

Bandit RRSD

Idea: Extend RRSD to bandit setting with explore-then-commit framework!

0
T

In the exploration phase, assign each agent each service a fixed number of times

Bandit RRSD

Idea: Extend RRSD to bandit setting with explore-then-commit framework!

0
T

In the exploration phase, assign each agent each service a fixed number of times

Wait until all arms are available

Bandit RRSD

Idea: Extend RRSD to bandit setting with explore-then-commit framework!

0
T

In the exploration phase, assign each agent each service a fixed number of times

Wait until all arms are available

In the exploitation phase, play RRSD, using the last preference submission of each agent

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

1

5

7

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

1

5

7

0

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

1

5

7

0

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

1

5

7

0

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

1

5

7

0

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

1

5

7

0

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

1

5

7

0

Bandit RRSD

0
T

2

2

1

1

1

2

1

1

2

1

1

1

8

1

9

7

10

3

9

4

1

5

7

0

Viva Voce

By nickbishop

Viva Voce

  • 15