Overview

ML in Feedback Sys #1

Prof Sarah Dean

Machine Learning

Feedback Systems

automated system

environment

action

measure-ment

training data  \(\{(x_i, y_i)\}\)

model

\(f:\mathcal X\to\mathcal Y\)

features

predicted label

ML in Feedback Systems

training data

\(\{(x_i, y_i)\}\)

model

\(f:\mathcal X\to\mathcal Y\)

policy

 

 

observation

action

ML in Feedback Systems

training data

\(\{(x_i, y_i)\}\)

model

\(f:\mathcal X\to\mathcal Y\)

observation

prediction

Supervised learning

\(\mathcal D\)

sampled i.i.d. from \(\mathcal D\)

\(x\sim\mathcal D_{x}\)

Goal: for new sample \(x,y\sim \mathcal D\), prediction \(\hat y = f(x)\) is close to true \(y\)

model

\(f_t:\mathcal X\to\mathcal Y\)

observation

prediction

Online learning

\(x_t\)

Goal: cumulatively over time, predictions \(\hat y_t = f_t(x_t)\) are close to true \(y_t\)

accumulate

\(\{(x_t, y_t)\}\)

policy

\(\pi_t:\mathcal X\to\mathcal A\)

observation

action

(Contextual) Bandits

\(x_t\)

Goal: cumulatively over time, actions \(\pi_t(x_t)\) achieve high reward

\(a_t\)

accumulate

\(\{(x_t, a_t, r_t)\}\)

policy

\(\pi_t:\mathcal X^t\to\mathcal A\)

observation

action

Online control/RL

\(x_t\)

Goal: select actions \(a_t\) to bring environment to high-reward state

\(a_t\)

accumulate

\(\{(x_t, a_t, r_t)\}\)

Topics and Schedule

  • Unit 1: Learning to predict (Aug-Sept)
    • Supervised learning & Fairness
    • Online learning
    • Dynamical systems & Stability
  • Unit 2: Learning to act (Oct-Nov)
    • Multi-armed Bandits
    • Control/RL & Robustness
    • Model predictive control & Safety
  • Detailed Calendar

Prerequisites

Assignments

  • 10% participation
  • 20% scribing
  • 20% paper presentation
  • 50% final project

Participation expectation: actively ask questions and contribute to discussions

  • in class (in person when possible)
  • and/or on Ed Discussions (exercises)

Scribing

  • high quality notes using the Tufte-handout template
  • summarize the lecture and expand upon it
  • draft due one week after lecture, revision is due a week after feedback
  • Sign up sheet

Paper presentations

  • group of 2-3 responsible for presenting and leading discussion
    • single or multiple papers
  • assigned based on ranked choice, full list of papers here
  • should cover motivation, problem statement, prior work, main results, technical tools, and future work
  • first paper presentations 9/12 and 9/14
    • HSNL18 Fairness Without Demographics in Repeated Loss Minimization
    • PZMH20 Performative Prediction

Final Project

  • topic that connects class material to you research
  • groups of up to three
  • deliverables:
    • Project proposal (1 page) due mid-October
    • Midterm update (2 pages) due mid-November
    • Project report (4-6 pages) due last day of class

Introductions

How would you design a classifier?

\((\qquad,\text{sitting})\)

\((\qquad,\text{sitting})\)

\((\qquad,\text{standing})\)

\((\qquad,\text{standing})\)

\((\qquad,\text{?})\)

How would you design a classifier?

\(\hat y = \hat f(\qquad)\)

$$\widehat f = \arg\min_{f\in\mathcal F} \sum_{i=1}^N \ell(y_i, f(x_i))$$

Loss functions

Ex - classification

  • \(\mathbb{1}\{y\neq\hat y\}\)
  • \(\max\{0, \hat y-y\}\)

\(\ell(y,\hat y)\)  measures "loss" of predicting \(\hat y\) when it's actually \(y\)

Ex - regression

  • \(|\hat y-y|\)
  • \((\hat y-y)^2\)

Risk

Claim: The predictor with the lowest possible risk is

  • \(\mathbb E[y| x]\) for squared loss
  • \(\mathbb 1\{\mathbb E[y| x]\geq t\}\) for  0-1 loss, with \(t\) depending on \(\mathcal D\)

The risk of a predictor \(f\) over a distribution \(\mathcal D\) is the expected (average) loss

$$\mathcal R(f) = \mathbb E_{x,y\sim\mathcal D}[\ell(y, f(x))]$$

Proof: exercise. Hint: use tower property of expectation.

Prediction errors

Loss determines trade-offs between (potentially inevitable) errors

Ex - sit/stand classifier with \(x=\) position of face in frame

  • \(\ell(\)sitting\(,\)sitting\()=0\)
  • \(\ell(\)standing\(,\)standing\()=0\)
  • \(\ell(\)sitting\(,\)standing\()\)
  • \(\ell(\)standing\(,\)sitting\()\)

Discrimination

In many domains, decisions have moral and legal significance

Harms can occur at many levels

  1. Correctness: who is burdened by errors?
  2. Stereotyping: which correlations are permissible?
  3. Specification: who is left out?

Sample vs. population

Fundamental Theorem of Supervised Learning:

  • The risk is bounded by the empirical risk plus the generalization error. $$ \mathcal R(f) \leq \mathcal R_N(f) + |\mathcal R(f) - \mathcal R_N(f)|$$

Empirical risk minimization

$$\hat f = \min_{f\in\mathcal F} \frac{1}{N} \sum_{i=1}^N \ell(y_i, f(x_i))$$

\(\{\)

\(\mathcal R_N(f)\)

1. Representation

2. Optimization

3. Generalization

Recap

Next time: more on fairness & non-discrimination, then linear regression case study

training data

\(\{(x_i, y_i)\}\)

model

\(f:\mathcal X\to\mathcal Y\)

  1. define loss
  2. do ERM

\(\mathcal D\)

performance depends on representation, optimization, and generalization

Ref: Ch 2-3 of Hardt & Recht, "Patterns, Predictions, and Actions" mlstory.org