Overview

ML in Feedback Sys #1

Fall 2025, Prof Sarah Dean

Machine Learning

Feedback Systems

automated system

environment

action

measure-ment

training data $\{(x_i, y_i)\}$

model

$p:\mathcal X\to\mathcal Y$

features

predicted label

Notation

features: $x\in\mathcal X$
labels: $y\in\mathcal Y$
data index $i$
predictive model $p$

ML and Feedback

Training algorithms:

Auto-regressive models:

...

$p_{\theta_2}:\mathcal X\to\mathcal Y$

$x_3$

$\hat y_3$ vs. $y_3$

$p_{\theta_1}:\mathcal X\to\mathcal Y$

$x_2$

$\hat y_2$ vs. $y_2$

$p_{\theta_0}:\mathcal X\to\mathcal Y$

$x_1$

$\hat y_1$ vs. $y_1$

ML in Feedback Systems

training data

$\{(x_i, y_i)\}$

model

$p:\mathcal X\to\mathcal Y$

policy

observation

action

ML in Feedback Systems

training data

$\{(x_i, y_i)\}$

model

$p:\mathcal X\to\mathcal Y$

observation

prediction

Supervised learning

$\mathcal D$

sampled i.i.d. from $\mathcal D$

$x\sim\mathcal D_{x}$

Goal: for new sample $x,y\sim \mathcal D$, prediction $\hat y = p(x)$ is close to true $y$

Notation

data distribution $\mathcal D$
marginal feature distribution $\mathcal D_x$
prediction $\hat y$

accumulate

$\{x_t\}$

model

$p:\mathcal X^t\to\mathcal X$

observation

prediction

Sequential data

$\mathcal D_t$

$x_t\sim\mathcal D_{t}$

Goal: for new observation $x\sim \mathcal D_t$, prediction $\hat x_t$ is close to true $x_t$

Notation

time index $t$
distribution $\mathcal D_t$

model

$p_t:\mathcal X\to\mathcal Y$

observation

prediction

Online learning

$x_t$

Goal: cumulatively over time, predictions $\hat y_t = p_t(x_t)$ are close to true $y_t$

accumulate

$\{(x_t, y_t)\}$

policy

$\pi_t:\mathcal X\to\mathcal A$

observation

action

(Contextual) Bandits

$x_t$

Goal: cumulatively over time, actions $\pi_t(x_t)$ achieve high reward

$a_t$

accumulate

$\{(x_t, a_t, r_t)\}$

Notation

action (decision) $a\in\mathcal A$
reward $r\in\mathbb R$
policy $\pi$

policy

$\pi_t:\mathcal X^t\to\mathcal A$

observation

action

Online Optimal Control

$x_t$

Goal: select actions $a_t$ to bring environment to high-reward state

$a_t$

accumulate

$\{(x_t, a_t, r_t)\}$

Topics and Schedule

Unit 1: Learning to predict (Aug-Sept)
- Supervised learning
- Models of sequential data
- Online learning
Unit 2: Learning to act (Oct-Nov)
- (Contextual) Bandits
- Model predictive control
- Policy learning
Materials and Detailed Calendar

Prerequisites

Linear algebra, convex optimization, and probability
- Linear Algebra Review and Reference, Convex Optimization Overview, Review of Probability Theory
Our focus is on theoretical foundations and frameworks
- algorithms and why they work (i.e., proofs)
- applications/implications as motivation
(Aspirational) lecture format:
- Motivating application
- Description of algorithm "what we do"
- Optimization/statistical theory "why we do it"

Workload

10% participation in lectures
25% weekly assignments
25% paper presentation
40% final project

Participation expectation: actively ask questions and contribute to discussions

Assignments

Weekly assignments contribute to a collaborative GitHub repository
Flexible: often some coding is required, but you may choose between more coding or writing/math
github.com/ml-feedback-sys/collaborative-f25/
In order to join, you must fill out form linked in Syllabus

Paper presentations

Later half of semester
Students present selected papers and lead a discussion
- List and signup to come
Weekly assignments for non-presenters contribute to the discussion
Presenters must meet with TA in advance of presentation

Final Project

Topic that connects class material to you research
Groups of up to three
Deliverables:
- Project proposal (1 page) due mid-October
- Midterm update (2 pages) due mid-November
- Project report (4-6 pages) due last day of class

Introductions

How would you design a classifier?

$(\qquad,\text{sitting})$

$(\qquad,\text{standing})$

$(\qquad,\text{?})$

How would you design a classifier?

$\hat y = p_{\theta_T}(\qquad)$

What we do:

Initialize $\theta_0$
For $t=0,...,T$
- Sample $x_i, y_i$ from dataset $$\theta_{t+1} = \theta_t - \eta \nabla_\theta \ell(y_i, p_{\theta}(x_i))_{\theta=\theta_t}$$

How would you design a classifier?

$\hat y = \hat p(\qquad)$

Why we do it:

$$\widehat p \approx \arg\min_{p\in\mathcal P} \underbrace{\sum_{i=1}^N \ell(y_i, p(x_i))}_{\approx \mathbb E[\ell(y, p(x))]}, \quad\mathcal P = \{p_\theta | \theta\in\mathbb R^d\}$$

Recall the goal: for new sample $x,y\sim \mathcal D$, prediction $\hat y$ is close to true $y$

Loss functions

Ex - classification

0-1 loss $\mathbb{1}\{y\neq\hat y\}$
hinge loss $\max\{0, \hat y-y\}$

$\ell(y,\hat y)$ measures "loss" of predicting $\hat y$ when it's actually $y$

Ex - regression

absolute error $|\hat y-y|$
squared error $(\hat y-y)^2$

Risk

Claim: The predictor with the lowest possible risk is

$\mathbb E[y| x]$ for squared loss
$\mathbb 1\{\mathbb E[y| x]\geq t\}$ for 0-1 loss, with $t$ depending on $\mathcal D$

The risk of a predictor $p$ over a distribution $\mathcal D$ is the expected (average) loss

$$\mathcal R(p) = \mathbb E_{x,y\sim\mathcal D}[\ell(y, p(x))]$$

Proof: exercise. Hint: use tower property of expectation.

Risk Minimization

Goal: for new sample $x,y\sim \mathcal D$, prediction $\hat y = p(x)$ is close to true $y$

$\ell(y,\hat y)$ measures "loss" of predicting $\hat y$ when it's actually $y$

$\implies$ Encode our goal in risk minimization framework:

$$\min_{p\in\mathcal P}\mathcal R(p) = \mathbb E_{x,y\sim\mathcal D}[\ell(y, p(x))]$$

$\approx \mathcal R_N(p) = \frac{1}{N}\sum_{i=1}^N \ell(x_i,y_i)$

1. Representation

2. Optimization

3. Generalization

$\implies$ Solve with iterative algorithms (e.g. gradient descent) on the empirical risk

Recap

Next time: deep dive with least-squares

Syllabus and materials: https://github.com/ml-feedback-sys/materials-f25
Collaborative repository: https://github.com/ml-feedback-sys/collaborative-f25 to join, fill out the form linked in Syllabus
Supervised learning: risk minimization

01 - Overview - ML in Feedback Sys F25

By Sarah Dean

01 - Overview - ML in Feedback Sys F25

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

Overview

ML in Feedback Sys #1

Machine Learning

Feedback Systems

ML and Feedback

ML in Feedback Systems

ML in Feedback Systems

Supervised learning

\(\mathcal D\)

Sequential data

\(\mathcal D_t\)

Online learning

(Contextual) Bandits

Online Optimal Control

Topics and Schedule

Prerequisites

Workload

Assignments

Paper presentations

Final Project

Introductions

How would you design a classifier?

\((\qquad,\text{sitting})\)

\((\qquad,\text{sitting})\)

\((\qquad,\text{standing})\)

\((\qquad,\text{standing})\)

\((\qquad,\text{?})\)

How would you design a classifier?

\(\hat y = p_{\theta_T}(\qquad)\)

How would you design a classifier?

\(\hat y = \hat p(\qquad)\)

Loss functions

Risk

Risk Minimization

Recap

01 - Overview - ML in Feedback Sys F25

01 - Overview - ML in Feedback Sys F25

Sarah Dean PRO

Overview

ML in Feedback Sys #1

Machine Learning

Feedback Systems

ML and Feedback

ML in Feedback Systems

ML in Feedback Systems

Supervised learning

\(\mathcal D\)

Sequential data

\(\mathcal D_t\)

Online learning

(Contextual) Bandits

Online Optimal Control

Topics and Schedule

Prerequisites

Workload

Assignments

Paper presentations

Final Project

Introductions

How would you design a classifier?

\((\qquad,\text{sitting})\)

\((\qquad,\text{sitting})\)

\((\qquad,\text{standing})\)

\((\qquad,\text{standing})\)

\((\qquad,\text{?})\)

How would you design a classifier?

\(\hat y = p_{\theta_T}(\qquad)\)

How would you design a classifier?

\(\hat y = \hat p(\qquad)\)

Loss functions

Risk

Risk Minimization

Recap

01 - Overview - ML in Feedback Sys F25

More from Sarah Dean