# Learning Dynamics from Bilinear Observations

### Sarah Dean, Cornell University

June 2024

$$\to$$

historical interactions

probability of new interaction

## Outline

1. Motivation: Implications for Personalization

2. Learning Dynamics from Bilinear Observations

inputs

outputs

time

## Outline

1. Motivation: Implications for Personalization

## Setting: Preference Dynamics

$$u_t$$

$$y_t$$

Interests may be impacted by recommended content

preference state $$x_t$$

expressed preferences

recommended content

recommender policy

$$u_t$$

$$y_t = \langle x_t, u_t\rangle + w_t$$

Interests may be impacted by recommended content

expressed preferences

recommended content

recommender policy

### $$y_{ij} \approx x_i^\top u_j$$

underlies factorization-based methods

preference state $$x_t$$

## Setting: Preference Dynamics

$$u_t$$

$$y_t = \langle x_t, u_t\rangle + w_t$$

Interests may be impacted by recommended content

expressed preferences

recommended content

recommender policy

underlies factorization-based methods

state $$x_t$$ updates to $$x_{t+1}$$

## Setting: Preference Dynamics

$$y_t = \langle x_t, u_t\rangle + v_t$$

$$x_{t+1} = f_t(x_t, u_t)$$

preferences $$x\in\mathcal X=\mathcal S^{d-1}$$

recommendations $$u_t\in\mathcal U\subseteq \mathcal S^{d-1}$$

## Examples of Preference Dynamics

• Assimilation: interests become more similar to recommended content $$x_{t+1} \propto x_t + \eta_t u_t$$
• Biased Assimilation: interest update is proportional to affinity $$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$$
• Proposed by Hązła et al. (2019) as model of opinion dynamics

initial preference
resulting preference

## Prior Work

2. Biased assimilation

$$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$$

When recommendations are made globally, the outcomes differ:

initial preference
resulting preference

1. Assimilation

$$x_{t+1} \propto x_t + \eta_t u_t$$

polarization (Hązła et al. 2019; Gaitonde et al. 2021)

homogenized preferences

## Personalized Recommendations

Regardless of whether assimilation is biased,

Personalized fixed recommendation $$u_t=u$$

$$x_t = \alpha_t x_0 + \beta_t u$$

positive and decreasing

increasing magnitude (same sign as $$\langle x_0, u\rangle$$ if biased assimilation)

$$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$$

$$x_{t+1} \propto x_t + \eta_t u_t$$

## Personalized Recommendations

Regardless of whether assimilation is biased,

Implications [DM22]

1. It is not necessary to identify preferences to make high affinity recommendations

$$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$$

$$x_{t+1} \propto x_t + \eta_t u_t$$

## Personalized Recommendations

Regardless of whether assimilation is biased,

initial preference
resulting preference

Implications [DM22]

1. It is not necessary to identify preferences to make high affinity recommendations

2. Preferences "collapse" towards whatever users are often recommended

$$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$$

$$x_{t+1} \propto x_t + \eta_t u_t$$

## Personalized Recommendations

Regardless of whether assimilation is biased,

initial preference
resulting preference

Implications [DM22]

1. It is not necessary to identify preferences to make high affinity recommendations

2. Preferences "collapse" towards whatever users are often recommended

3. Non-manipulation (and other goals) can be achieved through randomization

$$x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t$$

$$x_{t+1} \propto x_t + \eta_t u_t$$

## Harmful Recommendations

Simple choice model: given a recommendation, a user

1. Selects the recommendation with probability determined by affinity
2. Otherwise, selects from among all content based on affinities

Preference dynamics lead to a new perspective on harm

Simple definition: harm caused by consumption of harmful content

$$\mathbb P\{\mathrm{click}\}$$

$$\circ$$

𝅘𝅥

#

𝅘𝅥

𝅘𝅥𝅘𝅥

$$\circ$$

𝅘𝅥

#

## Harmful Recommendations

Without preference dynamics, harm minimizing policy is the engagement maximizing policy (excluding harmful items)

Recommendation: ♫

𝅘𝅥𝅘𝅥

$$\circ$$

𝅘𝅥

#

Recommendation: 𝅘𝅥

𝅘𝅥

$$\circ$$

𝅘𝅥

#

$$\mathbb P\{\mathrm{click}\}$$

$$\mathbb P \{\mathrm{click}\}$$

$$\circ$$

𝅘𝅥

#

𝅘𝅥

## Harmful Recommendations

With preference dynamics, there may be downstream harm, even when no harmful content is recommended

Recommendation: ♫

𝅘𝅥𝅘𝅥

$$\circ$$

𝅘𝅥

#

Recommendation: 𝅘𝅥

𝅘𝅥

$$\circ$$

𝅘𝅥

#

$$\mathbb P\{\mathrm{click}\}$$

$$\mathbb P \{\mathrm{click}\}$$

$$\circ$$

𝅘𝅥

#

𝅘𝅥

## Harmful Recommendations

With preference dynamics, there may be downstream harm, even when no harmful content is recommended

Recommendation: ♫

𝅘𝅥𝅘𝅥

$$\circ$$

𝅘𝅥

#

Recommendation: 𝅘𝅥

𝅘𝅥

$$\circ$$

𝅘𝅥

#

$$\mathbb P\{\mathrm{click}\}$$

$$\mathbb P \{\mathrm{click}\}$$

This motivates a new recommendation objective which takes into account the probability of future harm [CDEIKW24]

## Outline

1. Motivation: Implications for Personalization

2. Learning Dynamics from Bilinear Observations

inputs

outputs

time

## Outline

2. Learning Dynamics from Bilinear Observations

inputs

outputs

time

## Problem Setting

• Unknown dynamics and measurement functions
• Observed trajectory of inputs $$u\in\mathbb R^p$$ and outputs $$y\in\mathbb R$$ $$u_0,y_0,u_1,y_1,...,u_T,y_T$$
• Goal: identify dynamics and measurement models from data
• Setting: linear/bilinear with $$A\in\mathbb R^{n\times n}$$, $$B\in\mathbb R^{n\times p}$$, $$C\in\mathbb R^{p\times n}$$ $$x_{t+1} = Ax_t + Bu_t + w_t\\ y_t = u_t^\top Cx_t + v_t$$

e.g. playlist attributes

e.g. listen time

inputs $$u_t$$



outputs $$y_t$$

## Identification Algorithm

Input: data $$(u_0,y_0,...,u_T,y_T)$$, history length $$L$$, state dim $$n$$

Step 1: Regression

$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2$$

Step 2: Decomposition $$\hat A,\hat B,\hat C = \mathrm{HoKalman}(\hat G, n)$$

$$t$$

$$L$$

$$\underbrace{\qquad\qquad}$$

inputs

outputs

time

## Estimation Errors

$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2$$

• (Biased) estimate of Markov parameters $$G =\begin{bmatrix} C B & CA B & \dots & CA^{L-1} B \end{bmatrix}$$
• Regress $$y_t$$ against $$\underbrace{ \begin{bmatrix} u_{t-1}^\top & ... & u_{t-L}^\top \end{bmatrix}}_{\bar u_{t-1}^\top } \otimes u_t^\top$$

$$t$$

$$L$$

$$\underbrace{\qquad\qquad}$$

$$\bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G)$$

inputs

outputs

time

## Estimation Errors

$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2$$

• (Biased) estimate of Markov parameters $$G =\begin{bmatrix} C B & CA B & \dots & CA^{L-1} B \end{bmatrix}$$
• Define the data matrix $$\tilde U = \begin{bmatrix}\bar u_{L-1}^\top \otimes u_L^\top \\ \vdots \\ \bar u_{T-1}^\top \otimes u_T^\top\end{bmatrix}$$

$$\bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G)$$

$$*$$

$$=$$

$$\underbrace{\qquad\qquad}$$

$$\underbrace{\qquad\qquad}$$

$$\underbrace{\qquad\qquad}$$

$$\underbrace{\qquad\qquad\qquad}$$

$$t$$

$$L$$

$$\underbrace{\qquad\qquad}$$

inputs

outputs

time

## Main Results

Assumptions:

1. Process and measurement noise $$w_t,v_t$$ are i.i.d., zero mean, and have bounded second moments
2. Inputs $$u_t$$ are bounded
3. The dynamics $$(A,B,C)$$ are observable, controllable, and strictly stable, i.e. $$\rho(A)<1$$

### Informal Theorem (Markov parameter estimation)

With probability at least $$1-\delta$$, $$\epsilon_G=\|G-\hat G\|_{\tilde U^\top \tilde U} \lesssim \sqrt{ \frac{p^2 L}{\delta} \cdot c_{\mathrm{stability,noise}} }+ \rho(A)^L\sqrt{T} c_{\mathrm{stability}}$$

## Main Results

Assumptions:

1. Process and measurement noise $$w_t,v_t$$ are i.i.d., zero mean, and have bounded second moments
2. Inputs $$u_t$$ are bounded
3. The dynamics $$(A,B,C)$$ are observable, controllable, and strictly stable, i.e. $$\rho(A)<1$$

### Informal Theorem (system identification)

Suppose $$L$$ is sufficiently large. Then, there exists a nonsingular matrix $$S$$ (i.e. a similarity transform) such that

$$\|A-S\hat AS^{-1}\|_{\tilde U^\top \tilde U}$$

$$\| B-S\hat B\|_{\tilde U^\top \tilde U}$$

$$\| C-\hat CS^{-1}\|_{\tilde U^\top \tilde U}$$

$$\lesssim c_{\mathrm{contr,obs,dim}} \|G-\hat G\|_{\tilde U^\top \tilde U}$$

$$\underbrace{\qquad\qquad}$$

## Main Results

Assumptions:

1. Process and measurement noise $$w_t,v_t$$ are i.i.d., zero mean, and have bounded second moments
2. Inputs $$u_t$$ are bounded
3. The dynamics $$(A,B,C)$$ are observable, controllable, and strictly stable, i.e. $$\rho(A)<1$$

### Informal Summary Theorem

With high probabilty, $$\mathrm{est.~errors} \lesssim \sqrt{ \frac{\mathsf{poly}(\mathrm{dimension})}{\sigma_{\min}(\tilde U^\top \tilde U)}}$$

## Sample Complexity

### Informal Conjecture

When $$u_t$$ are chosen i.i.d. and sub-Gaussian and $$T$$ is large enough, whp $$\sigma_{\min}({\tilde U^\top \tilde U} )\gtrsim T$$

### Informal Corollary

For i.i.d. and sub-Gaussian inputs, whp $$\mathrm{est.~errors} \lesssim \sqrt{ \frac{\mathsf{poly}(\mathrm{dim.})}{T}}$$

How large does $$T$$ need to be to guarantee bounded estimation error?

$$*$$

$$=$$

formal analysis involves the structured random matrix $$\tilde U$$

1. Motivation: preference dynamics
• Assimilation dynamics & harm
2. Learning from bilinear observations

• sample complexity
• marginal stability
• prediction (filtering)
• applications

## Conclusion & Discussion

inputs

outputs

time

1. Preference Dynamics Under Personalized Recommendations at EC22 (arxiv:2205.13026) with Jamie Morgenstern
2. Harm Mitigation in Recommender Systems under User Preference Dynamics at KDD24 (arxiv:2406.09882) with Jerry Chee, Sindhu Ernala, Stratis Ioannidis, Shankar Kalyanaraman, Udi Weinsberg
3. Learning Linear Dynamics from Bilinear Observations (poster here!) with Yahya Sattar

Other References

• Gaitonde, Kleinberg, Tardos, 2021. Polarization in geometric opinion dynamics. EC.
• Hązła, Jin, Mossel, Ramnarayan, 2019. A geometric model of opinion polarization. Mathematics of Operations Research.
• Omyak & Ozay, 2019. Non-asymptotic Identification of LTI Systems from a Single Trajectory. ACC.

## References

more details on affinity maximization, preference stationarity, and mode collapse

(Oymak & Ozay, 2019)

## Proof Sketch

$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - \bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G) \big)^2$$

• Claim: this is a biased estimate of Markov parameters $$G_\star =\begin{bmatrix} C B & CA B & \dots & CA^{L-1} B \end{bmatrix}$$
• Observe that $$x_t = \sum_{k=1}^L CA^{k-1} B (u_{t-k} + w_{t-k}) + A^L x_{t-L}$$
• Hence, $$y_t= \bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G_\star) +u_t^\top \textstyle \sum_{k=1}^L CA^{k-1} w_{t-k} + u_t CA^L x_{t-L} + v_t$$
• Least squares: for $$y_t = z_t^\top \theta + n_t$$, the estimate  $$\hat\theta=\arg \min\sum_t (z_t^\top \theta - y_t)^2$$ $$= \textstyle\arg \min \|Z \theta - Y\|^2_2 = (Z^\top Z)^\dagger Z^\top Y= \theta_\star + (Z^\top Z)^\dagger Z^\top N$$
• Estimation errors are therefore $$\|G_\star -\hat G\|_{\tilde U^\top\tilde U} = \|\tilde U^\top N\|$$

## Equivalent representations

Set of equivalent state space representations for all invertible and square $$M$$

$$s_{t+1} = As_t + Bw_t$$

$$y_t = Cs_t+v_t$$

$$\tilde s_{t+1} = \tilde A\tilde s_t + \tilde B w_t$$

$$y_t = \tilde C\tilde s_t+v_t$$

$$\tilde s = M^{-1}s$$

$$\tilde A = M^{-1}AM$$

$$\tilde B = M^{-1}B$$

$$\tilde C = CM$$

$$s = M\tilde s$$

$$A = M\tilde AM^{-1}$$

$$B = M\tilde B$$

$$C = \tilde CM^{-1}$$

$$y_t$$

$$A$$

$$s$$

$$w_t$$

$$v_t$$

$$C$$

$$B$$

$$y_t$$

$$\tilde A$$

$$s$$

$$w_t$$

$$v_t$$

$$\tilde C$$

$$\tilde B$$

By Sarah Dean

• 51