Accounting for Real World Phenomena in Machine Learning and Mechanism Design

Nicholas Bishop

Decision-Making in Theory

(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y})

Real World Complications

(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y})

Insurance

$500

$470

Insurance

$450

$470

Insurance

Inspection doesn't scale!

Real World Complications

(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y})

Resource Deployment

We must account for resource availability!

Goals

Consider agent incentives when making decisions

Account for real world constraints on decision

making

Contributions

Stackelberg Prediction Games for Linear Regression (Chapter 3)

Adversarial Blocking Bandits (Chapter 4)

Sequential Blocked Matching (Chapter 5)

Strategic Linear Regression

\{(\textcolor{blue}{\mathbf{x}_{i}}, \textcolor{red}{y_{i}}, \textcolor{purple}{z_{i}})\}^{m}_{i=1}

(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y}, \textcolor{purple}{z_{i}})

At training time:

At test time:

(\textcolor{blue}{\tilde{\mathbf{x}}}, \textcolor{red}{y}, \textcolor{purple}{z_{i}})

Strategic Linear Regression

\{(\textcolor{blue}{\mathbf{x}_{i}}, \textcolor{red}{y_{i}}, \textcolor{purple}{z_{i}})\}^{m}_{i=1}

(\textcolor{blue}{\mathbf{x}}, \textcolor{red}{y}, \textcolor{purple}{z})

At training time:

At test time:

(\textcolor{blue}{\tilde{\mathbf{x}}}, \textcolor{red}{y}, \textcolor{purple}{z})

Target of the data provider

Choosing a Predictor

Idea: Simulate agent behaviour using training data!

\min_{\mathbf{w}}\sum^{m}_{i=1}(\mathbf{w}^{\top}\textcolor{blue}{\mathbf{x}^{\star}} - \textcolor{red}{y_{i}})^{2}

\text{s.t. } \textcolor{blue}{\mathbf{x}_{i}^{\star}} = \text{arg}\min_{\textcolor{blue}{\tilde{\mathbf{x}}_{i}}} (\mathbf{w}^{\top}\textcolor{blue}{\tilde{\mathbf{x}}} - \textcolor{purple}{z_{i}})^{2} + \gamma\|\textcolor{blue}{\tilde{\mathbf{x}}_{i}} - \textcolor{blue}{\mathbf{x}_{i}}\|^{2}_{2} \quad \forall i

Learner's loss

Agent's loss

Agent's manipulation cost

Key Questions

Can we solve such an optimisation problem?

Does the solution generalise well?

Key Questions

Can we solve such an optimisation problem?

Does the solution generalise well?

Optimisation

Reformulate the problem as a fractional program

\text{arg}\min_{\mathbf{w}}\left\|\frac{\frac{1}{\gamma}\textcolor{purple}{\mathbf{z}}\|\mathbf{w}\|^{2} + \textcolor{blue}{X}\mathbf{w}}{ (1 + \frac{1}{\gamma}\|\mathbf{w}\|^{2})^{2}} - \textcolor{red}{\mathbf{y}}\right\|^{2}

Substitute out

\|\mathbf{w}\|^{2}

\arg\min_{\mathbf{w}, \alpha}\left\|\frac{\frac{\alpha}{\gamma}\textcolor{purple}{\mathbf{z}} + \textcolor{blue}{X}\mathbf{w}}{ (1 + \frac{\alpha}{\gamma})^{2}} - \textcolor{red}{\mathbf{y}}\right\|^{2}

\text{s.t. } \alpha = \|\mathbf{w}\|^{2}

Optimisation

Use fractional programming to rewrite this problem as single parameter root finding problem.

\text{s.t. } \alpha = \|\mathbf{w}\|^{2}

F(q) = \text{arg}\min_{\mathbf{w}, \alpha}\|\frac{\alpha}{\gamma}\textcolor{purple}{\mathbf{z}} + \textcolor{blue}{X}\mathbf{w} - \textcolor{red}{\mathbf{y}} - \frac{\alpha}{\gamma}\textcolor{red}{\mathbf{y}}\|^{2} - q(1 + \frac{\alpha}{\gamma})^{2}

\text{s.t. } \alpha = \|\mathbf{w}\|^{2}

= 0

Idea: Use bisection search to find a root!

Optimisation

Problem: How do we evaluate

F(q)\text{?}

\text{s.t. }\begin{bmatrix} A + \lambda B & \mathbf{a} + \lambda \mathbf{b} \\ \mathbf{a}^{\top} + \lambda \mathbf{b}^{\top} & c - \tau \end{bmatrix}

\max_{\tau, \lambda} \:\tau

Solution: Convert to an SDP!

Generalisation

h_{\mathbf{w}}(\textcolor{blue}{\mathbf{x}}, \textcolor{purple}{z}) = \frac{\mathbf{w}^{\top}\textcolor{blue}{\mathbf{x}} + \frac{1}{\gamma}\|\mathbf{w}\|^{2}\textcolor{purple}{z}}{1 + \frac{1}{\gamma}\|\mathbf{w}\|^{2}}

Consider what a linear function predicts after manipulation:

Each of these functions are linear and have bounded norm!

Hence we can bound the Rademacher complexity of the resulting hypothesis class!

Blocking Bandits

\textcolor{blue}{\mu_{1}}

\textcolor{blue}{\mu_{2}}

\textcolor{blue}{\mu_{3}}

\textcolor{red}{D_{1}}

\textcolor{red}{D_{2}}

\textcolor{red}{D_{3}}

Blocking Bandits

\textcolor{blue}{1}

\textcolor{blue}{5}

\textcolor{blue}{8}

\textcolor{red}{1}

\textcolor{red}{2}

Blocking Bandits

\textcolor{blue}{1}

\textcolor{blue}{5}

\textcolor{blue}{8}

\textcolor{red}{1}

\textcolor{red}{2}

\textcolor{green}{1.5}

Blocking Bandits

\textcolor{blue}{1}

\textcolor{blue}{5}

\textcolor{blue}{8}

\textcolor{red}{1}

\textcolor{red}{2}

Blocking Bandits

\textcolor{blue}{1}

\textcolor{blue}{5}

\textcolor{blue}{8}

\textcolor{red}{1}

\textcolor{red}{2}

\textcolor{green}{0.9}

Blocking Bandits

\textcolor{blue}{1}

\textcolor{blue}{5}

\textcolor{blue}{8}

\textcolor{red}{1}

\textcolor{red}{2}

Blocking Bandits

\textcolor{blue}{1}

\textcolor{blue}{5}

\textcolor{blue}{8}

\textcolor{red}{1}

\textcolor{red}{2}

Rewards and Delays in the Real World

\textcolor{blue}{1}

\textcolor{blue}{5}

\textcolor{blue}{8}

\textcolor{red}{1}

\textcolor{red}{2}

Rewards and Delays in the Real World

\textcolor{blue}{10}

\textcolor{blue}{11}

\textcolor{blue}{2}

\textcolor{red}{3}

\textcolor{red}{5}

\textcolor{red}{3}

Rewards and Delays in the Real World

\textcolor{blue}{10}

\textcolor{blue}{11}

\textcolor{blue}{2}

\textcolor{red}{3}

\textcolor{red}{5}

\textcolor{red}{3}

Rewards and Delays in the Real World

\textcolor{blue}{2}

\textcolor{blue}{20}

\textcolor{blue}{10}

\textcolor{red}{6}

\textcolor{red}{7}

\textcolor{red}{6}

Adversarial Blocking Bandits

Rewards vary adversarially in accordance with a path variation budget

Blocking durations are free to vary arbitrarily, but are bounded above.

\sum^{T-1}_{t=1}\sum^{K}_{k=1}|X^{k}_{t+1} - X^{k}_{t}| \leq B_{T}

D^{k}_{t} \leq \tilde{D} \quad \forall k , t

Full Information Setting

Consider a greedy algorithm which pulls the arm with highest reward

Using a knapsack-style proof, we obtain the following regret guarantee

\text{arg}\max_{k \in A_{t}}X^{k}_{t}

\left(1 + \tilde{D}\right)^{-1}\left(1 - \frac{\tilde{D}B_{T}}{r(\pi^{\star})}\right)