Overview

ML in Feedback Sys #1

Fall 2025, Prof Sarah Dean

Machine Learning

Feedback Systems

automated system

environment

action

measure-ment

training data  \(\{(x_i, y_i)\}\)

model

\(p:\mathcal X\to\mathcal Y\)

features

predicted label

Notation

  • features: \(x\in\mathcal X\)
  • labels: \(y\in\mathcal Y\)
  • data index \(i\)
  • predictive model \(p\)

ML and Feedback

Training algorithms:

Auto-regressive models:

...

\(p_{\theta_2}:\mathcal X\to\mathcal Y\)

\(x_3\)

\(\hat y_3\) vs. \(y_3\)

\(p_{\theta_1}:\mathcal X\to\mathcal Y\)

\(x_2\)

\(\hat y_2\) vs. \(y_2\)

\(p_{\theta_0}:\mathcal X\to\mathcal Y\)

\(x_1\)

\(\hat y_1\) vs. \(y_1\)

ML in Feedback Systems

training data

\(\{(x_i, y_i)\}\)

model

\(p:\mathcal X\to\mathcal Y\)

policy

 

 

observation

action

ML in Feedback Systems

training data

\(\{(x_i, y_i)\}\)

model

\(p:\mathcal X\to\mathcal Y\)

observation

prediction

Supervised learning

\(\mathcal D\)

sampled i.i.d. from \(\mathcal D\)

\(x\sim\mathcal D_{x}\)

Goal: for new sample \(x,y\sim \mathcal D\), prediction \(\hat y = p(x)\) is close to true \(y\)

Notation

  • data distribution \(\mathcal D\)
  • marginal feature distribution \(\mathcal D_x\)
  • prediction \(\hat y\)

accumulate

\(\{x_t\}\)

model

\(p:\mathcal X^t\to\mathcal X\)

observation

prediction

Sequential data

\(\mathcal D_t\)

\(x_t\sim\mathcal D_{t}\)

Goal: for new observation \(x\sim \mathcal D_t\), prediction \(\hat x_t\) is close to true \(x_t\)

Notation

  • time index \(t\)
  • distribution \(\mathcal D_t\)

model

\(p_t:\mathcal X\to\mathcal Y\)

observation

prediction

Online learning

\(x_t\)

Goal: cumulatively over time, predictions \(\hat y_t = p_t(x_t)\) are close to true \(y_t\)

accumulate

\(\{(x_t, y_t)\}\)

policy

\(\pi_t:\mathcal X\to\mathcal A\)

observation

action

(Contextual) Bandits

\(x_t\)

Goal: cumulatively over time, actions \(\pi_t(x_t)\) achieve high reward

\(a_t\)

accumulate

\(\{(x_t, a_t, r_t)\}\)

Notation

  • action (decision) \(a\in\mathcal A\)
  • reward \(r\in\mathbb R\)
  • policy \(\pi\)

policy

\(\pi_t:\mathcal X^t\to\mathcal A\)

observation

action

Online Optimal Control

\(x_t\)

Goal: select actions \(a_t\) to bring environment to high-reward state

\(a_t\)

accumulate

\(\{(x_t, a_t, r_t)\}\)

Topics and Schedule

  • Unit 1: Learning to predict (Aug-Sept)
    • Supervised learning
    • Models of sequential data
    • Online learning
  • Unit 2: Learning to act (Oct-Nov)
    • (Contextual) Bandits
    • Model predictive control
    • Policy learning
  • Materials and Detailed Calendar

Prerequisites

Workload

  • 10% participation in lectures
  • 25% weekly assignments
  • 25% paper presentation
  • 40% final project

Participation expectation: actively ask questions and contribute to discussions

Assignments

  • Weekly assignments contribute to a collaborative GitHub repository
  • Flexible: often some coding is required, but you may choose between more coding or writing/math
  • github.com/ml-feedback-sys/collaborative-f25/
  • In order to join, you must fill out form linked in Syllabus

Paper presentations

  • Later half of semester
  • Students present selected papers and lead a discussion
    • List and signup to come
  • Weekly assignments for non-presenters contribute to the discussion
  • Presenters must meet with TA in advance of presentation

Final Project

  • Topic that connects class material to you research
  • Groups of up to three
  • Deliverables:
    • Project proposal (1 page) due mid-October
    • Midterm update (2 pages) due mid-November
    • Project report (4-6 pages) due last day of class

Introductions

How would you design a classifier?

\((\qquad,\text{sitting})\)

\((\qquad,\text{sitting})\)

\((\qquad,\text{standing})\)

\((\qquad,\text{standing})\)

\((\qquad,\text{?})\)

How would you design a classifier?

\(\hat y =  p_{\theta_T}(\qquad)\)

What we do:

  • Initialize \(\theta_0\)

  • For \(t=0,...,T\)

    • Sample \(x_i, y_i\) from dataset $$\theta_{t+1} =  \theta_t - \eta \nabla_\theta \ell(y_i, p_{\theta}(x_i))_{\theta=\theta_t}$$

How would you design a classifier?

\(\hat y = \hat p(\qquad)\)

Why we do it:

$$\widehat p \approx \arg\min_{p\in\mathcal P} \underbrace{\sum_{i=1}^N \ell(y_i, p(x_i))}_{\approx \mathbb E[\ell(y, p(x))]}, \quad\mathcal P = \{p_\theta | \theta\in\mathbb R^d\}$$

Recall the goal: for new sample \(x,y\sim \mathcal D\), prediction \(\hat y\) is close to true \(y\)

Loss functions

Ex - classification

  • 0-1 loss \(\mathbb{1}\{y\neq\hat y\}\)
  • hinge loss \(\max\{0, \hat y-y\}\)

\(\ell(y,\hat y)\)  measures "loss" of predicting \(\hat y\) when it's actually \(y\)

Ex - regression

  • absolute error \(|\hat y-y|\)
  • squared error \((\hat y-y)^2\)

Risk

Claim: The predictor with the lowest possible risk is

  • \(\mathbb E[y| x]\) for squared loss
  • \(\mathbb 1\{\mathbb E[y| x]\geq t\}\) for  0-1 loss, with \(t\) depending on \(\mathcal D\)

The risk of a predictor \(p\) over a distribution \(\mathcal D\) is the expected (average) loss

$$\mathcal R(p) = \mathbb E_{x,y\sim\mathcal D}[\ell(y, p(x))]$$

Proof: exercise. Hint: use tower property of expectation.

Risk Minimization

Goal: for new sample \(x,y\sim \mathcal D\), prediction \(\hat y = p(x)\) is close to true \(y\)

\(\ell(y,\hat y)\)  measures "loss" of predicting \(\hat y\) when it's actually \(y\)

\(\implies\) Encode our goal in risk minimization framework:

$$\min_{p\in\mathcal P}\mathcal R(p) = \mathbb E_{x,y\sim\mathcal D}[\ell(y, p(x))]$$

\(\approx \mathcal R_N(p) = \frac{1}{N}\sum_{i=1}^N \ell(x_i,y_i)\)

1. Representation

2. Optimization

3. Generalization

\(\implies\) Solve with iterative algorithms (e.g. gradient descent) on the empirical risk

Recap

Next time: deep dive with least-squares

01 - Overview - ML in Feedback Sys F25

By Sarah Dean

01 - Overview - ML in Feedback Sys F25

  • 92