Overview
ML in Feedback Sys #1
Prof Sarah Dean
Machine Learning
Feedback Systems








automated system
environment
action
measure-ment
training data \(\{(x_i, y_i)\}\)
model
\(f:\mathcal X\to\mathcal Y\)
features
predicted label

ML in Feedback Systems






training data
\(\{(x_i, y_i)\}\)
model
\(f:\mathcal X\to\mathcal Y\)
policy
observation
action
ML in Feedback Systems

training data
\(\{(x_i, y_i)\}\)
model
\(f:\mathcal X\to\mathcal Y\)
observation
prediction
Supervised learning
\(\mathcal D\)
sampled i.i.d. from \(\mathcal D\)
\(x\sim\mathcal D_{x}\)
Goal: for new sample \(x,y\sim \mathcal D\), prediction \(\hat y = f(x)\) is close to true \(y\)

model
\(f_t:\mathcal X\to\mathcal Y\)
observation
prediction
Online learning
\(x_t\)
Goal: cumulatively over time, predictions \(\hat y_t = f_t(x_t)\) are close to true \(y_t\)
accumulate
\(\{(x_t, y_t)\}\)

policy
\(\pi_t:\mathcal X\to\mathcal A\)
observation
action
(Contextual) Bandits
\(x_t\)
Goal: cumulatively over time, actions \(\pi_t(x_t)\) achieve high reward
\(a_t\)
accumulate
\(\{(x_t, a_t, r_t)\}\)

policy
\(\pi_t:\mathcal X^t\to\mathcal A\)
observation
action
Online control/RL
\(x_t\)
Goal: select actions \(a_t\) to bring environment to high-reward state
\(a_t\)
accumulate
\(\{(x_t, a_t, r_t)\}\)
Topics and Schedule
- Unit 1: Learning to predict (Aug-Sept)
- Supervised learning & Fairness
- Online learning
- Dynamical systems & Stability
- Unit 2: Learning to act (Oct-Nov)
- Multi-armed Bandits
- Control/RL & Robustness
- Model predictive control & Safety
- Detailed Calendar
Prerequisites
- Machine learning
- Linear algebra, convex optimization, and probability
- Lectures will focus on theoretical foundations
- Focus on practical concerns and applications welcome for discussion and projects!
Assignments
- 10% participation
- 20% scribing
- 20% paper presentation
- 50% final project
Participation expectation: actively ask questions and contribute to discussions
- in class (in person when possible)
- and/or on Ed Discussions (exercises)
Scribing
- high quality notes using the Tufte-handout template
- summarize the lecture and expand upon it
- draft due one week after lecture, revision is due a week after feedback
- Sign up sheet
Paper presentations
- group of 2-3 responsible for presenting and leading discussion
- single or multiple papers
- assigned based on ranked choice, full list of papers here
- should cover motivation, problem statement, prior work, main results, technical tools, and future work
- first paper presentations 9/12 and 9/14
- HSNL18 Fairness Without Demographics in Repeated Loss Minimization
- PZMH20 Performative Prediction
Final Project
- topic that connects class material to you research
- groups of up to three
- deliverables:
- Project proposal (1 page) due mid-October
- Midterm update (2 pages) due mid-November
- Project report (4-6 pages) due last day of class
Introductions
How would you design a classifier?




\((\qquad,\text{sitting})\)

\((\qquad,\text{sitting})\)
\((\qquad,\text{standing})\)
\((\qquad,\text{standing})\)
\((\qquad,\text{?})\)
How would you design a classifier?

\(\hat y = \hat f(\qquad)\)
$$\widehat f = \arg\min_{f\in\mathcal F} \sum_{i=1}^N \ell(y_i, f(x_i))$$
Loss functions
Ex - classification
- \(\mathbb{1}\{y\neq\hat y\}\)
- \(\max\{0, \hat y-y\}\)
\(\ell(y,\hat y)\) measures "loss" of predicting \(\hat y\) when it's actually \(y\)
Ex - regression
- \(|\hat y-y|\)
- \((\hat y-y)^2\)
Risk
Claim: The predictor with the lowest possible risk is
- \(\mathbb E[y| x]\) for squared loss
- \(\mathbb 1\{\mathbb E[y| x]\geq t\}\) for 0-1 loss, with \(t\) depending on \(\mathcal D\)
The risk of a predictor \(f\) over a distribution \(\mathcal D\) is the expected (average) loss
$$\mathcal R(f) = \mathbb E_{x,y\sim\mathcal D}[\ell(y, f(x))]$$
Proof: exercise. Hint: use tower property of expectation.
Prediction errors

Loss determines trade-offs between (potentially inevitable) errors
Ex - sit/stand classifier with \(x=\) position of face in frame
- \(\ell(\)sitting\(,\)sitting\()=0\)
- \(\ell(\)standing\(,\)standing\()=0\)
- \(\ell(\)sitting\(,\)standing\()\)
- \(\ell(\)standing\(,\)sitting\()\)
Discrimination
In many domains, decisions have moral and legal significance
Harms can occur at many levels
- Correctness: who is burdened by errors?
- Stereotyping: which correlations are permissible?
- Specification: who is left out?



Sample vs. population
Fundamental Theorem of Supervised Learning:
- The risk is bounded by the empirical risk plus the generalization error. $$ \mathcal R(f) \leq \mathcal R_N(f) + |\mathcal R(f) - \mathcal R_N(f)|$$
Empirical risk minimization
$$\hat f = \min_{f\in\mathcal F} \frac{1}{N} \sum_{i=1}^N \ell(y_i, f(x_i))$$
\(\{\)
\(\mathcal R_N(f)\)
1. Representation
2. Optimization
3. Generalization
Recap
Next time: more on fairness & non-discrimination, then linear regression case study

training data
\(\{(x_i, y_i)\}\)
model
\(f:\mathcal X\to\mathcal Y\)
- define loss
- do ERM
\(\mathcal D\)
performance depends on representation, optimization, and generalization
Ref: Ch 2-3 of Hardt & Recht, "Patterns, Predictions, and Actions" mlstory.org
01 - Overview - ML in Feedback Sys
By Sarah Dean