User Dynamics in Machine Learning Systems

Sarah Dean

asst prof in CS at Cornel

BIRS Workshop, November 2025

from actions impacting the world

Dynamics arise

from data impacting the policy

Machine Learning Systems

Machine Learning Systems

users (consumers)

creators (producers)

platform

platforms (markets)

Outline

1. Preference dynamics

2. Participation dynamics

Personalization

\(a_t\)

preference state \(s_t\)

expressed preferences

recommended content

recommender policy

\(y_t = \langle s_t, a_t\rangle + v_t  \)

\(a_t\)

\(y_t = \langle s_t, a_t\rangle + v_t  \)

Interests may be impacted by recommended content

expressed preferences

recommended content

recommender policy

state \(s_t\) updates to \(s_{t+1}\)

Personalization

  • Simple dynamics that capture assimilation (adapted from opinion dynamics) $$s_{t+1} \propto s_t + \eta_t a_t,\qquad y_t = s_t^\top a_t + v_t$$
  • If \(\eta_t\) constant, tends to homogenization globally
  • If \(\eta_t \propto s_t^\top a_t\) (i.e. biased assimilation), tends to polarization

Preference Dynamics

initial preference
resulting preference
recommendation

  1. Preference Dynamics Under Personalized Recommendations at EC22 (arxiv:2205.13026) with Jamie Morgenstern
  2. Harm Mitigation in Recommender Systems under User Preference Dynamics at KDD24 (arxiv:2406.09882) with Chee, Kalyanaraman, Ernala, Weinsberg, Ioannidis
  • Simple dynamics that capture assimilation (adapted from opinion dynamics) $$s_{t+1} \propto s_t + \eta_t a_t,\qquad y_t = s_t^\top a_t + v_t$$

Preference Dynamics

Implications for personalization [DM22]

  1. It is not necessary to estimate preferences to make "good" recommendations

  2. Preferences "collapse" towards whatever users are often recommended

  3. "Preference control" can be achieved through randomization

Harm in recommendations [CKEWDI24]

  1. Simple operationalization: harm caused by consumption of harmful content

  2. Even if harmful content is never recommended, can cause harm through preference shifts

initial preference
resulting preference
recommendation

Outline

1. Preference dynamics

2. Participation dynamics

Two-sided platforms

On online platforms, algorithms mediate the experience of users, both viewers and creators

viewers

creators

Exposure: reaching a large audience

  • determines each creator's population
  • in turn, population determines quality

Satisfaction: high quality and interesting content

  • determines each viewer's population

Total welfare: sum of user satisfaction, weighted by user population

Two-sided platforms

Participation dynamics: interplay of exposure, quality, and satisfaction




 

  • Implications [KYD25]:
    • Standard "myopic" approach to recommendation is almost never optimal in the long term
    • "Uniform" recommendation, guaranteeing creator exposure without personalizing to user interests, can perform well
    • A policy which approximates the long term outcome can balance exposure and satisfaction

Policy Design for Two-sided Platforms with Participation Dynamics (arXiv:2502.01792)
Haruka Kiyohara, Fan Yao, Sarah Dean

  • Individuals choose among services depending on accuracy

 

 

 

 


 

  • Services optimize for accuracy based on observed data
  • Both are locally accuracy maximizing (loss minimizing)

Choice between platforms

  1. Emergent specialization from participation dynamics and multi-learner retraining at AISTATS24 (arxiv:2206.02667) with Mihaela Curmei, Lillian J. Ratliff, Jamie Morgenstern, Maryam Fazel
  2. Learning from Streaming Data when Users Choose at ICML24 (arxiv:2406.01481) with Jinyan Su
  3. Initializing Services in Interactive ML Systems for Diverse Users (arxiv:2312.11846) with Avinandan Bose, Mihaela Curmei, Daniel L. Jiang, Jamie Morgenstern, Lillian J.Ratliff, Maryam Fazel
  • Individuals choose among services depending on accuracy

 

 

 

 


 

  • Services optimize for accuracy based on observed data
  • Both are locally accuracy maximizing (loss minimizing)

Choice between platforms

Main Result [DCRMF24]: Under regularity conditions on the loss, the only stable equilibria are segmented markets where no users prefers to switch

Choice between platforms

Implications [DCRMF24]:

  • ✅ Stable equilibria (locally) maximize utilitarian social welfare (minimize average loss)
    • ✅ Streaming learning algorithms asymptotically converge to these equilibria [SD24]
  • ❌ Generally NP hard to find global max by analogy to clustering
    • ✅ Simple coordinated initialization finds good approximation [BCJMDRF24]
  • ❌ Utilitarian welfare does not guarantee low worst-case loss
  • ✅ Increasing the number of learners decreases loss and increases social welfare

Main Result [DCRMF24]: Under regularity conditions on the loss, the only stable equilibria are segmented markets where no users prefers to switch

Summary

1. Preference dynamics

2. Participation dynamics

Ongoing work: learning models from partial observations (both theory and application)

Safety Challenges

1. Absence of low-level control interface

  • Current algorithms tweak rankings in ad-hoc manner
  • Lacking standard abstraction to high level goals like exposure, diversity, fairness, etc

Safety Challenges

2. Unknown dynamics with limited controllability

  • Cannot appeal to well established laws of physics
  • Social dynamics driven largely by factors outside the purview of any single algorithmic system

1. Absence of low-level control interface

Safety Challenges

2. Unknown dynamics with limited controllability

1. Absence of low-level control interface

3. What are harms and who decides?

  • Human values are indeterminate and hard to formalize
  • Need to bridge the gap between social and the technical

Safety Challenges

2. Unknown dynamics with limited controllability

1. Absence of low-level control interface

3. What are harms and who decides?

  Facing these challenges requires solving rich technical problems [BFDJ24] as well as interdisciplinary perspectives [LD22; GDLZS22].

  1. Ranking with Long-Term Constraints with Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims
  2. Engineering a Safer Recommender System with Liu Leqi
  3. Reward Reports for Reinforcement Learning with Thomas Krendl Gilbert, Nathan Lambert, Tom Zick, and Aaron Snoswell

BIRS Workshop: User Dynamics in ML Systems

By Sarah Dean

BIRS Workshop: User Dynamics in ML Systems

  • 18