Sarah Dean PRO
asst prof in CS at Cornell
4th Annual Learning for Dynamics & Control Conference, June 2022
→
historical movie ratings
new movie rating
ut
yt=gt(xt,ut)
Interests may be impacted by recommended content
xt+1=ft(xt,ut)
expressed preferences
recommended content
recommender policy
ut
yt=⟨xt,ut⟩+wt
Interests may be impacted by recommended content
xt+1=ft(xt,ut)
expressed preferences
recommended content
recommender policy
underlies factorization-based methods
ut
yt=⟨xt,ut⟩+wt
expressed preferences
recommended content
recommender policy
underlies factorization-based methods
A model inspired by biased assimilation updates proportional to affinity
xt+1∝xt+ηt⟨xt,ut⟩ut
items ut∈U⊆Sd−1
yt=⟨xt,ut⟩+wt
A model inspired by biased assimilation updates proportional to affinity
xt+1∝xt+ηt⟨xt,ut⟩ut
preferences x∈Sd−1
Proposed by Hązła et al. (2019) as model of opinion dynamics
yt=⟨xt,ut⟩+wt
A model inspired by biased assimilation updates proportional to affinity
xt+1∝xt+ηt⟨xt,ut⟩ut
Non-personalized exposure leads to polarization (Hązła et al. 2019; Gaitonde et al. 2021)
yt=⟨xt,ut⟩+wt
A model inspired by biased assimilation updates proportional to affinity
xt+1∝xt+ηt⟨xt,ut⟩ut
Personalized fixed recommendation ut=u
xt=αtx0+ βtu
positive and decreasing
increasing magnitude, same sign as ⟨x0,u⟩
regret of fixed strategy
Result: As long as ∣⟨x0,u⟩∣>c and noise is σ2 sub-Gaussian, R(T)= t=0∑T−11−⟨xt,ut⟩≤Cη(1/c2−1)+σ2logT/c2
Alg: Explore-then-Commit
maximum possible affinity
regret of explore-then-commit
Achieving high affinity is straightforward when U contains "opposites"
Non-manipulation (Krueger et al., 2020) is an alternative goal
R(T)=t=0∑T−11−⟨x0,xt⟩
When x0∈/U, use randomized strategy to select ut i.i.d.
E[xt+1]∝(I+ηtE[uu⊤])xt
Informal Result: Suppose x0 is the dominant eigenvector of E[uu⊤] and step size ηt decays like 1+t1. Then
R(T)≲logT
Proof sketch:
⟨x0,xt⟩=∥(I+ut−1ut−1⊤)…(I+u0u0⊤)x0∥2x0⊤(I+ut−1ut−1⊤)…(I+u0u0⊤)x0
Using concentration for matrix products (Huang et al., 2021),
1−⟨x0,xt⟩2≲t1
Rather than polarization....
...preferences may "collapse"
but this can be avoided using randomization
Necessary to have x0∈span(U) for
Observation function F(x0;u0:T)=y0:T where yt=⟨xt,ut⟩.
Result: F:Sd−1→RT is locally invertible if and only if u0:T span Rd.
find q such that q≥0, Udiag(U⊤x0)q=x0,
I−Udiag(q)U⊤⪰0
Result: x0 is dominant eigenvector if randomization is proportional to q.
Result: Problem is feasible if and only if x0 is in the span of \(\tilde \mathcal U = \{\mathrm{sign}(u^\top x_0)\cdot u\mid u\in \mathcal U\}\)
Open questions:
Key points:
Does social media have the ability to manipulate us, or merely to segment and target?
For single learner, leads to representation disparity (Hashimoto et al., 2018; Zhang et al., 2019)
Choose to participate depending on accuracy (e.g. music recommendation)
Self-reinforcing feedback loop when learners retrain
Sub-populations i∈[1,n]
Learners j∈[1,m]
αt+1=ν(αt,Θt)
Θt+1=μ(αt+1,Θt)
evolve according to risks Ri(θj)
"risk minimizing in the limit"
strongly convex
Example: linear regression with
2
1
Result: An equilibrium (αeq,Θeq) must have Θeq=argminR(αeq,Θ) and is asymptotically stable if and only if
Definition: The total risk is R(α,Θ)=∑i=1n∑j=1mαijRi(θj)
Proof sketch: asympototically stable equilibria correspond to the isolated local minima of R(α,Θ)
Definition: In a split market, each sub-pop i allocates all participation to a single learner γ(i)
Utilitarian social welfare is inversely related to total risk
A notion of fairness is the worst-case risk over sub-pops
Open questions:
Key points:
Study dynamics to what end?
How to bridge the social and the technical? (Gilbert et al., 2022)
References
Krueger, Maharaj, Leike, 2020. Hidden incentives for auto-induced distributional shift. arXiv:2009.09153.
By Sarah Dean