Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Carina I Hausladen

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Who got to express their preferences? Who did not?

Whose values are we prioritizing?

Who is being harmed?

Who is being helped?

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

How do we measure bias in the social sciences?

Stereotypes live in a two-axis space.
 

Fiske, Cuddy, Glick & Xu 2002

Paternailsed
elderly, disabled
Admired
in-group, middle-class
Contempt
homeless, drug users
Envied
rich, Jewish (US data)

Competence
can the group act on its intentions?

Warmth 

Is the group cooperative or threatening?

Lakisha La

same CV

Emily Em

same CV

Are Emily and Greg More Employable than Lakisha and Jamal?
Bertrand & Mullainathan (2004)

The audit-study tradition

>
~50 % callback gap

DeGraffenreid v. General Motors (1976)

  • Five Black women sued GM alleging discrimination
  • The court ruled against them:
    • GM hired women (white women, in clerical roles) and
    • GM hired Black workers (Black men, in industrial roles)
  • so neither a standalone race claim nor a standalone sex claim could succeed
  • the court refused to recognise a combined "Black woman" claim.

Crenshaw (1989)

Intersectionality

Social perception of faces in a vision-language model

Carina I. Hausladen, Manuel Knott, Colin F. Camerer, Pietro Perona

FaccT 2025

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Social choice theory studies the aggregation of individual preferences into collective decisions.

Social Choice Theory

How does Social Choice Theory connect to AI Alignment?

Arrow's impossibility theorem

No voting system can satisfy all four conditions at once

  • Universality
  • Unanimity
  • Non-dictatorship
  • Independence of irrelevant alternatives

There is no aggregation rule that is neutral.

Every rule makes normative commitments (e.g. utilitarian, egalitarian, maximin) and those commitments can be made explicit and compared.

Reinforcement Learning from Human Feedback

.

.

.

.

.

.

Initial Language Model

 

 

 

 

 

.

.

.

.

.

.

Reward Preference Model

 

 

 

 

 

.

.

.

.

.

.

Tuned Language Model

 

 

 

 

 

Reinforcement Learning Update

Whose disagreement counts more?

Which trade-offs are acceptable?

Alignment documents as data

Alignment Strategies of Five Major AI Companies and Their Consequences

Carina I Hausladen

Working Paper

The corpus

  • 5 AI labs

  • 9 flagship documents

    • 2–3 docs per lab

    • ~150k words, 900 pages overall

    • 2018–2026  

    • mix of technical papers and programmatic blog posts                                                                                                                                                            

       

Five normative lenses

  • Utilitarian / Efficiency 
  • Prioritarian / Catastrophe-averse
  • Egalitarian / Fairness 
  • Rights / Constraints-based 
  • Decision Procedure / Democracy 

Aligment Intuition

  • Make sure nothing goes catastrophically wrong,
    then protect the worst-off.
  • Make the average user
    happy,

    within safety guardrails.
  • Learn what humans want on average. Optimize that.
     
  • Don't just look at the average,
    look at the
    worst-off group.

     
  • Different contexts get different rules. Be robust across all of them.

SWF Mapping

  • Minimize share below harm threshold, then maximize \( \sum_i f(u_i) \), \( f' > 0 \), \( f'' < 0 \).
     

  • Constrained utilitarianism with safety floor,
    maximize \( \frac{1}{|I_m|} \sum_{i \in I_m} u_i(m) \).

  • Plain utilitarianism:
    maximize \( \frac{1}{|I_m|} \sum_{i \in I_m} u_i(m) \).
     

  • Group maximin: \( W(m) = \min_g U_g(m) \) over
    demographic groups \( g \).
     

  • Config-robust: over configurations \( \theta \in \Theta \), use \( \min_{\theta \in \Theta} W(m \mid \theta) \) or its average.

  • 1,396 unique evaluators 
  • Three conversation types: unguided, values-guided, controversy-guided.
  • Up to 4 models respond per prompt;
  • 21 different models

Kirk et al. (2024)

The PRISM Alignment dataset

Terrible

Perfect

100

0

Ask, request, or talk to the model about anything. It is up to you! 

.

.

.

.

.

.

Terrible

Perfect

100

0

SWF Mapping

  • Minimize share below harm threshold, then maximize \( \sum_i f(u_i) \), \( f' > 0 \), \( f'' < 0 \).
     

  • Constrained utilitarianism with safety floor,
    maximize \( \frac{1}{|I_m|} \sum_{i \in I_m} u_i(m) \).

  • Plain utilitarianism:
    maximize \( \frac{1}{|I_m|} \sum_{i \in I_m} u_i(m) \).
     

  • Group maximin: \( W(m) = \min_g U_g(m) \) over
    demographic groups \( g \).
     

  • Config-robust: over configurations \( \theta \in \Theta \), use \( \min_{\theta \in \Theta} W(m \mid \theta) \) or its average.

Welfare trade-offs across alignment criteria

  • These companies' philosophies, as operationalized in my code, all point to command as the best choice.
  • Not because command has the highest median specifically, but because it scores best under each company’s own rule.

What changes once this distribution is taken seriously?

The alignment frameworks I examine do not incorporate real user preferences.

A single-model solution is insufficient

User 1

x

x

x

A single-model solution is insufficient

User 1

x

x

x

A single-model solution is insufficient

​Only 26.3% of users rank command somewhere in their top 3.

Six models would need to be deployed to achieve Pareto-optimal coverage of 80% of users' top 3 preferences

In social choice theory, multi-winner elections form a key subclass of aggregation methods.

Those could guide the optimal number of models deployed.

Performance varies across substantive domains

Ask, request, or talk to the model about anything. It is up to you! 

.

.

.

.

.

.

Terrible

Perfect

100

0

  • We show that major AI developers differ systematically in their normative choices about alignment. 
  • We formalize these choices within a social choice framework 
  • They induce distinct welfare consequences.
  • Results also underscore the need for democratic input.
    • Certain value conflicts are not fully reconcilable within a single-model paradigm. 
    • Model Pluralism 

Discussion

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

How well does this work?

LLM Voting: Human Choices and AI Collective Decision-Making

Joshua C. Yang, Damian Dailisan, Marcin Korecki,
Carina I. Hausladen, Dirk Helbing

AAAI/ACM Conference on AI, Ethics, and Society 2024

  • The LLM "population" functions like a small set of personas.
  • This aligns with previous findings that LLMs reflect a single WEIRD respondent.

How are votes distributed across projects?

LLMs are sensitive to list orderings.

Each model has its own preferences

Persona aligns topics
& erases other-regarding behaviour

The digital-twin frame has problems

  • Approval distributions are narrow
  • Collective outcomes flip on list-order details
  • Topic biases are the model's, not the voter's

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

human centered
participation
information
participation
information
information

VR support is hard to scale.

information
participation

Information was treated as given.

information
participation

Can we integrate both? 

How can we keep the immersive parts of VR but make it scalable?

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

Carina I Hausladen, Javier Argota Sánchez-Vaquerizo, Michael Siebenmann, Arthur Capozzi, Sachit Mahajan, Dirk Helbing

Preprint

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

information
participation

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

information
participation

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

information
participation
information
participation
information

Contextual boundaries (e.g. walking through a doorway) chunk experience into discrete units, improving retention.

Spatial anchoring

Information situated in a structured environment is recalled better than free-floating text.

Dual coding

Verbal and visual channels presented together encode more reliably than either alone. 

Event Segmentation

  • Treatment participants wrote longer answers,
  • used more of the original vocabulary and
  • were significantly more likely to retain numerical facts.

The walkthrough group remembered more

The narrator described the porous pavement as capable of holding several thousand litres of rainwater — "roughly ten bathtubs full". Treatment participants remembered the bathtubs.

Conversations changed

A citizen who has just spent ten minutes mentally walking down the redesigned street has a richer cognitive scaffold for the conversation that follows.


With well-prepared interlocutors, the model becomes a better servant — lighter, more responsive, more useful as a thinking partner than as an oracle.

Treatment

"Fruit trees can turn the area into a meeting place… for strangers to share a fruit."

"Family events… bringing together all of the different generations." "Brotherhood and unity among residents."

Control

"You aren't going to be able to bring tons of fresh, cooled produce on the back of a bike."

"A big cash pay-out may give me some compensation", or simply, "a parking space near my home."

LLMs received positive ratings

  • "I found them very useful and understanding, no improvement is needed."
  • "Gustavo was continually trying to engage, which could be a bit annoying"
  • "They both suffered from a sycophantic nature..."
  • "I think they were neutral and didn't sway me in either direction.
  • "It is aimed at always favoring the project and never really talking about the issues."

Treatment

"Are the newly planted trees well-rooted? Urban trees often fail because their roots don't take."
"Will there be services for leaf pickup and gardening?"

Control

"installation of sun sails", "luxuries like swimming pools".
"removal of parking spaces will destroy local business"; "a 15-min city does not work".

Consultation quality changed

Recall changes the quality of citizens' feedback. From "compensate me for the parking" to "have you thought about leaf pickup".

  • The platform serves as a diagnostic before publication. Inquiries revealed that our informational material omitted project costs and timelines.
  • Make engagement adaptive: Some citizens prefer fewer, sharper questions.
  • Conversational AI is not a substitute for face-to-face deliberation in entrenched cases.
  • Keep both personas, but let citizens choose.
  • Anchor before deliberating. The dialogue improves only when the encoding does. Voice-led 360° walkthroughs are a scalable alternative.

Conclusion

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Who got to express their preferences? Who did not?

Whose values are we prioritizing?

Who is being harmed?

Who is being helped?

human centered

carina.hausladen@uni-konstanz.de

Greenwald, McGhee & Schwartz 1998

The Implicit Association Test

Flagship alignment documents

The corpus

  • Anthropic
    • Constitutional AI: Harmlessness from AI Feedback (2022)
    • Sleeper Agents: Training Deceptively Aligned LLMs (2024)
    • Collective Constitutional AI (2023)
  • OpenAI
    • Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (2022)
    • Our Approach to Alignment Research (2023)
    • Rule-Based Rewards and Safety Policies (2023)
  • DeepMind
    • Scalable Agent Alignment via Reward Modeling (2018)
    • Learning to Summarize with Human Feedback (2020)
  • Meta
    • MetaAligner: Multi-Objective Alignment (2026)
    • Beyond Reward Hacking: Causal Rewards (2025)
  • Microsoft
    • Controllable Safety Alignment (CoSA) (2025)
    • Safer Pretraining of Language Models (2023)

Swiss urban planning has a long-standing tradition of offline participation.

However, online participation is not yet very advanced.

Instead, online participation is driven by third parties whose primary goal is profit maximization, rather than democratic dialogue.

What to ask?

Revealed preferences
fail to serve
individual or
societal well-being.

The questions a planning department would like to be asked

  • "Are the newly planted trees well-rooted? Urban trees often fail because their roots don't take."
  • "Will there be services for leaf pickup and gardening?"
  • Specific, granular and locally rooted — exactly the questions a planning department would like to be asked, and exactly the questions the standard sixty-page dossier rarely answers.

UC Berkeley graduate admissions, fall 1973

Simpson’s paradox

Herausforderungen künstlicher Intelligenz: Philosophische Perspektiven

By Carina Ines Hausladen

Herausforderungen künstlicher Intelligenz: Philosophische Perspektiven

  • 30