Learning Dynamics from Bilinear Observations

Sarah Dean, Cornell University

June 2024

Large scale automated systems, powered by machine learning

$\to$

historical interactions

probability of new interaction

Feedback arises when actions impact the world

Training data is correlated due to dynamics and feedback

Outline

1. Motivation: Implications for Personalization

2. Learning Dynamics from Bilinear Observations

inputs

outputs

time

Outline

1. Motivation: Implications for Personalization

i) Setting

ii) Assimilation

iii) Harm

Setting: Preference Dynamics

$u_t$

$y_t$

Interests may be impacted by recommended content

preference state $x_t$

expressed preferences

$\approx$

$y_{ij} \approx x_i^\top u_j$

underlies factorization-based methods

preference state $x_t$

Setting: Preference Dynamics

$u_t$

$y_t = \langle x_t, u_t\rangle + w_t $

Interests may be impacted by recommended content

expressed preferences

Setting: Preference Dynamics

$y_t = \langle x_t, u_t\rangle + v_t $

$x_{t+1} = f_t(x_t, u_t)$

preferences $x\in\mathcal X=\mathcal S^{d-1}$

recommendations $u_t\in\mathcal U\subseteq \mathcal S^{d-1}$

Examples of Preference Dynamics

Assimilation: interests become more similar to recommended content $$x_{t+1} \propto x_t + \eta_t u_t$$
Biased Assimilation: interest update is proportional to affinity $$ x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$$

Proposed by Hązła et al. (2019) as model of opinion dynamics

initial preference
resulting preference

Prior Work

2. Biased assimilation

$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$

When recommendations are made globally, the outcomes differ:

initial preference
resulting preference

1. Assimilation

$x_{t+1} \propto x_t + \eta_t u_t$

polarization (Hązła et al. 2019; Gaitonde et al. 2021)

homogenized preferences

Personalized Recommendations

Regardless of whether assimilation is biased,

Personalized fixed recommendation $u_t=u$

$$ x_t = \alpha_t x_0 + \beta_t u$$

positive and decreasing

increasing magnitude (same sign as $\langle x_0, u\rangle$ if biased assimilation)

$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$

$x_{t+1} \propto x_t + \eta_t u_t$

Personalized Recommendations

Regardless of whether assimilation is biased,

Implications [DM22]

It is not necessary to identify preferences to make high affinity recommendations

$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$

$x_{t+1} \propto x_t + \eta_t u_t$

Personalized Recommendations

Regardless of whether assimilation is biased,

initial preference
resulting preference

Implications [DM22]

It is not necessary to identify preferences to make high affinity recommendations
Preferences "collapse" towards whatever users are often recommended

$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$

$x_{t+1} \propto x_t + \eta_t u_t$

Personalized Recommendations

Regardless of whether assimilation is biased,

initial preference
resulting preference

Implications [DM22]

It is not necessary to identify preferences to make high affinity recommendations
Preferences "collapse" towards whatever users are often recommended
Non-manipulation (and other goals) can be achieved through randomization

$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$

$x_{t+1} \propto x_t + \eta_t u_t$

Harmful Recommendations

Simple choice model: given a recommendation, a user

Selects the recommendation with probability determined by affinity
Otherwise, selects from among all content based on affinities

Preference dynamics lead to a new perspective on harm

Simple definition: harm caused by consumption of harmful content

$\mathbb P\{\mathrm{click}\}$

$\circ$

♫

𝅘𝅥

♫

𝅘𝅥𝅘𝅥

$\circ$

♫

𝅘𝅥

Harmful Recommendations

Without preference dynamics, harm minimizing policy is the engagement maximizing policy (excluding harmful items)

Recommendation: ♫

♫

𝅘𝅥𝅘𝅥

$\circ$

♫

𝅘𝅥

Recommendation: 𝅘𝅥

♫

𝅘𝅥

$\circ$

♫

𝅘𝅥

$\mathbb P\{\mathrm{click}\}$

$\mathbb P \{\mathrm{click}\}$

$\circ$

♫

𝅘𝅥

♫

Harmful Recommendations

With preference dynamics, there may be downstream harm, even when no harmful content is recommended

Recommendation: ♫

♫

𝅘𝅥𝅘𝅥

$\circ$

♫

𝅘𝅥

Recommendation: 𝅘𝅥

♫

𝅘𝅥

$\circ$

♫

𝅘𝅥

$\mathbb P\{\mathrm{click}\}$

$\mathbb P \{\mathrm{click}\}$

$\circ$

♫

𝅘𝅥

♫

Harmful Recommendations

With preference dynamics, there may be downstream harm, even when no harmful content is recommended

Recommendation: ♫

♫

𝅘𝅥𝅘𝅥

$\circ$

♫

𝅘𝅥

Recommendation: 𝅘𝅥

♫

𝅘𝅥

$\circ$

♫

𝅘𝅥

$\mathbb P\{\mathrm{click}\}$

$\mathbb P \{\mathrm{click}\}$

This motivates a new recommendation objective which takes into account the probability of future harm [CDEIKW24]

Outline

1. Motivation: Implications for Personalization

2. Learning Dynamics from Bilinear Observations

inputs

outputs

time

Outline

2. Learning Dynamics from Bilinear Observations

i) Setting

ii) Algorithm

iii) Results

inputs

outputs

time

Problem Setting

Unknown dynamics and measurement functions
Observed trajectory of inputs $u\in\mathbb R^p$ and outputs $y\in\mathbb R$ $$u_0,y_0,u_1,y_1,...,u_T,y_T$$
Goal: identify dynamics and measurement models from data
Setting: linear/bilinear with $A\in\mathbb R^{n\times n}$, $B\in\mathbb R^{n\times p}$, $C\in\mathbb R^{p\times n}$ $$x_{t+1} = Ax_t + Bu_t + w_t\\ y_t = u_t^\top Cx_t + v_t$$

e.g. playlist attributes

e.g. listen time

inputs $u_t$

outputs $y_t$

Identification Algorithm

Input: data $(u_0,y_0,...,u_T,y_T)$, history length $L$, state dim $n$

Step 1: Regression

$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2 $$

Step 2: Decomposition $\hat A,\hat B,\hat C = \mathrm{HoKalman}(\hat G, n)$

$t$

$L$

$\underbrace{\qquad\qquad}$

inputs

outputs

time

Estimation Errors

$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2 $$

(Biased) estimate of Markov parameters $$ G =\begin{bmatrix} C B & CA B & \dots & CA^{L-1} B \end{bmatrix} $$
Regress $y_t$ against $$ \underbrace{ \begin{bmatrix} u_{t-1}^\top & ... & u_{t-L}^\top \end{bmatrix}}_{\bar u_{t-1}^\top } \otimes u_t^\top $$

$t$

$L$

$\underbrace{\qquad\qquad}$

$\bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G) $

inputs

outputs

time

Estimation Errors

$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2 $$

(Biased) estimate of Markov parameters $$ G =\begin{bmatrix} C B & CA B & \dots & CA^{L-1} B \end{bmatrix} $$
Define the data matrix $$\tilde U = \begin{bmatrix}\bar u_{L-1}^\top \otimes u_L^\top \\ \vdots \\ \bar u_{T-1}^\top \otimes u_T^\top\end{bmatrix} $$

$\bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G) $

$*$

$=$

$\underbrace{\qquad\qquad}$

$\underbrace{\qquad\qquad\qquad}$

$t$

$L$

$\underbrace{\qquad\qquad}$

inputs

outputs

time

Main Results

Assumptions:

Process and measurement noise $w_t,v_t$ are i.i.d., zero mean, and have bounded second moments
Inputs $u_t$ are bounded
The dynamics $(A,B,C)$ are observable, controllable, and strictly stable, i.e. $\rho(A)<1$

Informal Theorem (Markov parameter estimation)

With probability at least $1-\delta$, $$\epsilon_G=\|G-\hat G\|_{\tilde U^\top \tilde U} \lesssim \sqrt{ \frac{p^2 L}{\delta} \cdot c_{\mathrm{stability,noise}} }+ \rho(A)^L\sqrt{T} c_{\mathrm{stability}}$$

Main Results

Assumptions:

Process and measurement noise $w_t,v_t$ are i.i.d., zero mean, and have bounded second moments
Inputs $u_t$ are bounded
The dynamics $(A,B,C)$ are observable, controllable, and strictly stable, i.e. $\rho(A)<1$

Informal Theorem (system identification)

Suppose $L$ is sufficiently large. Then, there exists a nonsingular matrix $S$ (i.e. a similarity transform) such that

$\|A-S\hat AS^{-1}\|_{F}$

$\| B-S\hat B\|_{F}$

$\| C-\hat CS^{-1}\|_{F} $

$$\lesssim c_{\mathrm{contr,obs,dim}} \frac{\|G-\hat G\|_{F}}{\sqrt{\sigma_{\min}(\tilde U^\top \tilde U)}} $$

$\underbrace{\qquad\qquad}$

Main Results

Assumptions:

Process and measurement noise $w_t,v_t$ are i.i.d., zero mean, and have bounded second moments
Inputs $u_t$ are bounded
The dynamics $(A,B,C)$ are observable, controllable, and strictly stable, i.e. $\rho(A)<1$

Informal Summary Theorem

With high probabilty, $$\mathrm{est.~errors} \lesssim \sqrt{ \frac{\mathsf{poly}(\mathrm{dimension})}{\sigma_{\min}(\tilde U^\top \tilde U)}}$$

Sample Complexity

Informal Conjecture

When $u_t$ are chosen i.i.d. and sub-Gaussian and $T$ is large enough, whp $$\sigma_{\min}({\tilde U^\top \tilde U} )\gtrsim T$$

Informal Corollary

For i.i.d. and sub-Gaussian inputs, whp $$\mathrm{est.~errors} \lesssim \sqrt{ \frac{\mathsf{poly}(\mathrm{dim.})}{T}}$$

How large does $T$ need to be to guarantee bounded estimation error?

$*$

$=$

formal analysis involves the structured random matrix $\tilde U$

Motivation: preference dynamics
- Assimilation dynamics & harm
Learning from bilinear observations

sample complexity
marginal stability
prediction (filtering)
optimal & adaptive control
applications

Conclusion & Discussion

inputs

outputs

time

Preference Dynamics Under Personalized Recommendations at EC22 (arxiv:2205.13026) with Jamie Morgenstern
Harm Mitigation in Recommender Systems under User Preference Dynamics at KDD24 (arxiv:2406.09882) with Jerry Chee, Sindhu Ernala, Stratis Ioannidis, Shankar Kalyanaraman, Udi Weinsberg
Learning Linear Dynamics from Bilinear Observations (poster here!) with Yahya Sattar

Other References

Gaitonde, Kleinberg, Tardos, 2021. Polarization in geometric opinion dynamics. EC.
Hązła, Jin, Mossel, Ramnarayan, 2019. A geometric model of opinion polarization. Mathematics of Operations Research.
Omyak & Ozay, 2019. Non-asymptotic Identification of LTI Systems from a Single Trajectory. ACC.

Thanks! Questions?

References

more details on affinity maximization, preference stationarity, and mode collapse

(Oymak & Ozay, 2019)

Proof Sketch

$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - \bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G) \big)^2 $$

Claim: this is a biased estimate of Markov parameters $$ G_\star =\begin{bmatrix} C B & CA B & \dots & CA^{L-1} B \end{bmatrix} $$
- Observe that $x_t = \sum_{k=1}^L CA^{k-1} B (u_{t-k} + w_{t-k}) + A^L x_{t-L}$
- Hence, $y_t= \bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G_\star) +u_t^\top \textstyle \sum_{k=1}^L CA^{k-1} w_{t-k} + u_t CA^L x_{t-L} + v_t $
Least squares: for $y_t = z_t^\top \theta + n_t$, the estimate $\hat\theta=\arg \min\sum_t (z_t^\top \theta - y_t)^2$ $$= \textstyle\arg \min \|Z \theta - Y\|^2_2 = (Z^\top Z)^\dagger Z^\top Y= \theta_\star + (Z^\top Z)^\dagger Z^\top N$$
Estimation errors are therefore $\|G_\star -\hat G\|_{\tilde U^\top\tilde U} = \|\tilde U^\top N\| $

Equivalent representations

Set of equivalent state space representations for all invertible and square $M$

$s_{t+1} = As_t + Bw_t$

$y_t = Cs_t+v_t$

$\tilde s_{t+1} = \tilde A\tilde s_t + \tilde B w_t$

$y_t = \tilde C\tilde s_t+v_t$

$\tilde s = M^{-1}s$

$\tilde A = M^{-1}AM$

$\tilde B = M^{-1}B$

$\tilde C = CM$

$ s = M\tilde s$

$ A = M\tilde AM^{-1}$

$ B = M\tilde B$

$C = \tilde CM^{-1}$

$y_t$

$A$

$s$

$w_t$

$v_t$

$C$

$B$

$y_t$

$\tilde A$

$s$

$w_t$

$v_t$

$\tilde C$

$\tilde B$

OLC Talk: Learning Dynamics from Bilinear Observations

By Sarah Dean

OLC Talk: Learning Dynamics from Bilinear Observations

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

Learning Dynamics from Bilinear Observations

Sarah Dean, Cornell University

Large scale automated systems, powered by machine learning

Feedback arises when actions impact the world

Training data is correlated due to dynamics and feedback

Outline

Outline

i) Setting

ii) Assimilation

iii) Harm

Setting: Preference Dynamics

\(\approx\)

\(y_{ij} \approx x_i^\top u_j\)

Setting: Preference Dynamics

Setting: Preference Dynamics

Examples of Preference Dynamics

Prior Work

Personalized Recommendations

Personalized Recommendations

Personalized Recommendations

Personalized Recommendations

Harmful Recommendations

Harmful Recommendations

Harmful Recommendations

Harmful Recommendations

Outline

Outline

i) Setting

ii) Algorithm

iii) Results

Problem Setting

Identification Algorithm

Estimation Errors

Estimation Errors

Main Results

Informal Theorem (Markov parameter estimation)

Main Results

Informal Theorem (system identification)

Main Results

Informal Summary Theorem

Sample Complexity

Informal Conjecture

Informal Corollary

Conclusion & Discussion

Thanks! Questions?

References

Proof Sketch

Equivalent representations

OLC Talk: Learning Dynamics from Bilinear Observations

More from Sarah Dean