Sarah Dean PRO
asst prof in CS at Cornell
June 2024
historical interactions
probability of new interaction
1. Motivation: Implications for Personalization
2. Learning Dynamics from Bilinear Observations
1. Motivation: Implications for Personalization
Interests may be impacted by recommended content
preference state \(x_t\)
expressed preferences
recommended content
recommender policy
\(y_t = \langle x_t, u_t\rangle + w_t \)
Interests may be impacted by recommended content
expressed preferences
recommended content
recommender policy
underlies factorization-based methods
preference state \(x_t\)
\(y_t = \langle x_t, u_t\rangle + w_t \)
Interests may be impacted by recommended content
expressed preferences
recommended content
recommender policy
underlies factorization-based methods
state \(x_t\) updates to \(x_{t+1}\)
\(y_t = \langle x_t, u_t\rangle + v_t \)
\(x_{t+1} = f_t(x_t, u_t)\)
preferences \(x\in\mathcal X=\mathcal S^{d-1}\)
recommendations \(u_t\in\mathcal U\subseteq \mathcal S^{d-1}\)
initial preference
resulting preference
2. Biased assimilation
\(x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t\)
When recommendations are made globally, the outcomes differ:
initial preference
resulting preference
1. Assimilation
\(x_{t+1} \propto x_t + \eta_t u_t\)
polarization (Hązła et al. 2019; Gaitonde et al. 2021)
homogenized preferences
Regardless of whether assimilation is biased,
Personalized fixed recommendation \(u_t=u\)
$$ x_t = \alpha_t x_0 + \beta_t u$$
positive and decreasing
increasing magnitude (same sign as \(\langle x_0, u\rangle\) if biased assimilation)
\(x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t\)
\(x_{t+1} \propto x_t + \eta_t u_t\)
Regardless of whether assimilation is biased,
Implications [DM22]
It is not necessary to identify preferences to make high affinity recommendations
\(x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t\)
\(x_{t+1} \propto x_t + \eta_t u_t\)
Regardless of whether assimilation is biased,
initial preference
resulting preference
Implications [DM22]
It is not necessary to identify preferences to make high affinity recommendations
Preferences "collapse" towards whatever users are often recommended
\(x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t\)
\(x_{t+1} \propto x_t + \eta_t u_t\)
Regardless of whether assimilation is biased,
initial preference
resulting preference
Implications [DM22]
It is not necessary to identify preferences to make high affinity recommendations
Preferences "collapse" towards whatever users are often recommended
Non-manipulation (and other goals) can be achieved through randomization
\(x_{t+1} \propto x_t + \eta_t\langle x_t, u_t\rangle u_t\)
\(x_{t+1} \propto x_t + \eta_t u_t\)
Simple choice model: given a recommendation, a user
Preference dynamics lead to a new perspective on harm
Simple definition: harm caused by consumption of harmful content
\(\mathbb P\{\mathrm{click}\}\)
Without preference dynamics, harm minimizing policy is the engagement maximizing policy (excluding harmful items)
Recommendation: ♫
Recommendation: 𝅘𝅥
\(\mathbb P\{\mathrm{click}\}\)
\(\mathbb P \{\mathrm{click}\}\)
With preference dynamics, there may be downstream harm, even when no harmful content is recommended
Recommendation: ♫
Recommendation: 𝅘𝅥
\(\mathbb P\{\mathrm{click}\}\)
\(\mathbb P \{\mathrm{click}\}\)
With preference dynamics, there may be downstream harm, even when no harmful content is recommended
Recommendation: ♫
Recommendation: 𝅘𝅥
\(\mathbb P\{\mathrm{click}\}\)
\(\mathbb P \{\mathrm{click}\}\)
This motivates a new recommendation objective which takes into account the probability of future harm [CDEIKW24]
1. Motivation: Implications for Personalization
2. Learning Dynamics from Bilinear Observations
2. Learning Dynamics from Bilinear Observations
e.g. playlist attributes
e.g. listen time
inputs \(u_t\)
\( \)
outputs \(y_t\)
Input: data \((u_0,y_0,...,u_T,y_T)\), history length \(L\), state dim \(n\)
Step 1: Regression
$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2 $$
Step 2: Decomposition \(\hat A,\hat B,\hat C = \mathrm{HoKalman}(\hat G, n)\)
$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2 $$
\(\bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G) \)
$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - u_t^\top \textstyle \sum_{k=1}^L G[k] u_{t-k} \big)^2 $$
\(\bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G) \)
With probability at least \(1-\delta\), $$\epsilon_G=\|G-\hat G\|_{\tilde U^\top \tilde U} \lesssim \sqrt{ \frac{p^2 L}{\delta} \cdot c_{\mathrm{stability,noise}} }+ \rho(A)^L\sqrt{T} c_{\mathrm{stability}}$$
Suppose \(L\) is sufficiently large. Then, there exists a nonsingular matrix \(S\) (i.e. a similarity transform) such that
\(\|A-S\hat AS^{-1}\|_{F}\)
\(\| B-S\hat B\|_{F}\)
\(\| C-\hat CS^{-1}\|_{F} \)
$$\lesssim c_{\mathrm{contr,obs,dim}} \frac{\|G-\hat G\|_{F}}{\sqrt{\sigma_{\min}(\tilde U^\top \tilde U)}} $$
With high probabilty, $$\mathrm{est.~errors} \lesssim \sqrt{ \frac{\mathsf{poly}(\mathrm{dimension})}{\sigma_{\min}(\tilde U^\top \tilde U)}}$$
When \(u_t\) are chosen i.i.d. and sub-Gaussian and \(T\) is large enough, whp $$\sigma_{\min}({\tilde U^\top \tilde U} )\gtrsim T$$
For i.i.d. and sub-Gaussian inputs, whp $$\mathrm{est.~errors} \lesssim \sqrt{ \frac{\mathsf{poly}(\mathrm{dim.})}{T}}$$
How large does \(T\) need to be to guarantee bounded estimation error?
formal analysis involves the structured random matrix \(\tilde U\)
Other References
more details on affinity maximization, preference stationarity, and mode collapse
(Oymak & Ozay, 2019)
$$\hat G = \arg\min_{G\in\mathbb R^{p\times pL}} \sum_{t=L}^T \big( y_t - \bar u_{t-1}^\top \otimes u_t^\top \mathrm{vec}(G) \big)^2 $$
Set of equivalent state space representations for all invertible and square \(M\)
\(s_{t+1} = As_t + Bw_t\)
\(y_t = Cs_t+v_t\)
\(\tilde s_{t+1} = \tilde A\tilde s_t + \tilde B w_t\)
\(y_t = \tilde C\tilde s_t+v_t\)
\(\tilde s = M^{-1}s\)
\(\tilde A = M^{-1}AM\)
\(\tilde B = M^{-1}B\)
\(\tilde C = CM\)
\( s = M\tilde s\)
\( A = M\tilde AM^{-1}\)
\( B = M\tilde B\)
\(C = \tilde CM^{-1}\)
\(\tilde A\)
\(\tilde C\)
\(\tilde B\)
By Sarah Dean