Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Carina I Hausladen

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Who got to express their preferences? Who did not?

Whose values are we prioritizing?

Who is being harmed?

Who is being helped?

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

How do we measure bias in the social sciences?

Stereotypes live in a two-axis space.

Fiske, Cuddy, Glick & Xu 2002

Paternailsed elderly, disabled	Admired in-group, middle-class
Contempt homeless, drug users	Envied rich, Jewish (US data)

Competence
can the group act on its intentions?

Warmth

Is the group cooperative or threatening?

Lakisha La

same CV

Emily Em

same CV

Are Emily and Greg More Employable than Lakisha and Jamal?
Bertrand & Mullainathan (2004)

The audit-study tradition

>
~50 % callback gap

DeGraffenreid v. General Motors (1976)

Five Black women sued GM alleging discrimination
The court ruled against them:
- GM hired women (white women, in clerical roles) and
- GM hired Black workers (Black men, in industrial roles)
so neither a standalone race claim nor a standalone sex claim could succeed
the court refused to recognise a combined "Black woman" claim.

Crenshaw (1989)

Intersectionality

Social perception of faces in a vision-language model

Carina I. Hausladen, Manuel Knott, Colin F. Camerer, Pietro Perona

FaccT 2025

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Social choice theory studies the aggregation of individual preferences into collective decisions.

Social Choice Theory

How does Social Choice Theory connect to AI Alignment?

Arrow's impossibility theorem

No voting system can satisfy all four conditions at once

Universality
Unanimity
Non-dictatorship
Independence of irrelevant alternatives

There is no aggregation rule that is neutral.

Every rule makes normative commitments (e.g. utilitarian, egalitarian, maximin) and those commitments can be made explicit and compared.

Reinforcement Learning from Human Feedback

.

Initial Language Model

.

Reward Preference Model

.

Tuned Language Model

Reinforcement Learning Update

Whose disagreement counts more?

Which trade-offs are acceptable?

Alignment documents as data

Alignment Strategies of Five Major AI Companies and Their Consequences

Carina I Hausladen

Working Paper

The corpus

5 AI labs
9 flagship documents
- 2–3 docs per lab
- ~150k words, 900 pages overall
- 2018–2026
- mix of technical papers and programmatic blog posts

Five normative lenses

Utilitarian / Efficiency
Prioritarian / Catastrophe-averse
Egalitarian / Fairness
Rights / Constraints-based
Decision Procedure / Democracy

Aligment Intuition

Make sure nothing goes catastrophically wrong,
then protect the worst-off.
Make the average user
happy,
within safety guardrails.
Learn what humans want on average. Optimize that.
Don't just look at the average,
look at the
worst-off group.
Different contexts get different rules. Be robust across all of them.

SWF Mapping

Minimize share below harm threshold, then maximize \( \sum_i f(u_i) \), \( f' > 0 \), \( f'' < 0 \).
Constrained utilitarianism with safety floor,
maximize \( \frac{1}{|I_m|} \sum_{i \in I_m} u_i(m) \).
Plain utilitarianism:
maximize \( \frac{1}{|I_m|} \sum_{i \in I_m} u_i(m) \).
Group maximin: \( W(m) = \min_g U_g(m) \) over
demographic groups \( g \).
Config-robust: over configurations \( \theta \in \Theta \), use \( \min_{\theta \in \Theta} W(m \mid \theta) \) or its average.

1,396 unique evaluators
Three conversation types: unguided, values-guided, controversy-guided.
Up to 4 models respond per prompt;
21 different models

Kirk et al. (2024)

The PRISM Alignment dataset

Terrible

Perfect

100

0

Ask, request, or talk to the model about anything. It is up to you!

.

Terrible

Perfect

100

0

SWF Mapping

Minimize share below harm threshold, then maximize \( \sum_i f(u_i) \), \( f' > 0 \), \( f'' < 0 \).
Constrained utilitarianism with safety floor,
maximize \( \frac{1}{|I_m|} \sum_{i \in I_m} u_i(m) \).
Plain utilitarianism:
maximize \( \frac{1}{|I_m|} \sum_{i \in I_m} u_i(m) \).
Group maximin: \( W(m) = \min_g U_g(m) \) over
demographic groups \( g \).
Config-robust: over configurations \( \theta \in \Theta \), use \( \min_{\theta \in \Theta} W(m \mid \theta) \) or its average.

Welfare trade-offs across alignment criteria

These companies' philosophies, as operationalized in my code, all point to command as the best choice.
Not because command has the highest median specifically, but because it scores best under each company’s own rule.

What changes once this distribution is taken seriously?

The alignment frameworks I examine do not incorporate real user preferences.

A single-model solution is insufficient

User 1

x

A single-model solution is insufficient

User 1

x

A single-model solution is insufficient

Only 26.3% of users rank command somewhere in their top 3.

Six models would need to be deployed to achieve Pareto-optimal coverage of 80% of users' top 3 preferences

In social choice theory, multi-winner elections form a key subclass of aggregation methods.

Those could guide the optimal number of models deployed.

Performance varies across substantive domains

Ask, request, or talk to the model about anything. It is up to you!

.

Terrible

Perfect

100

0

We show that major AI developers differ systematically in their normative choices about alignment.
We formalize these choices within a social choice framework
They induce distinct welfare consequences.
Results also underscore the need for democratic input.
- Certain value conflicts are not fully reconcilable within a single-model paradigm.
- Model Pluralism

Discussion

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

How well does this work?

LLM Voting: Human Choices and AI Collective Decision-Making

Joshua C. Yang, Damian Dailisan, Marcin Korecki,
Carina I. Hausladen, Dirk Helbing

AAAI/ACM Conference on AI, Ethics, and Society 2024

The LLM "population" functions like a small set of personas.
This aligns with previous findings that LLMs reflect a single WEIRD respondent.

How are votes distributed across projects?

LLMs are sensitive to list orderings.

Each model has its own preferences

Persona aligns topics
& erases other-regarding behaviour

The digital-twin frame has problems

Approval distributions are narrow
Collective outcomes flip on list-order details
Topic biases are the model's, not the voter's

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

human centered

participation

information

participation

information

information

VR support is hard to scale.

information

participation

Information was treated as given.

information

participation

Can we integrate both?

How can we keep the immersive parts of VR but make it scalable?

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

Carina I Hausladen, Javier Argota Sánchez-Vaquerizo, Michael Siebenmann, Arthur Capozzi, Sachit Mahajan, Dirk Helbing

Preprint

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

information

participation

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

information

participation

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

information

participation

information

participation

information

Contextual boundaries (e.g. walking through a doorway) chunk experience into discrete units, improving retention.

Spatial anchoring

Information situated in a structured environment is recalled better than free-floating text.

Dual coding

Verbal and visual channels presented together encode more reliably than either alone.

Event Segmentation

Treatment participants wrote longer answers,
used more of the original vocabulary and
were significantly more likely to retain numerical facts.

The walkthrough group remembered more

The narrator described the porous pavement as capable of holding several thousand litres of rainwater — "roughly ten bathtubs full". Treatment participants remembered the bathtubs.

Conversations changed

A citizen who has just spent ten minutes mentally walking down the redesigned street has a richer cognitive scaffold for the conversation that follows.

With well-prepared interlocutors, the model becomes a better servant — lighter, more responsive, more useful as a thinking partner than as an oracle.

Treatment

"Fruit trees can turn the area into a meeting place… for strangers to share a fruit."

"Family events… bringing together all of the different generations." "Brotherhood and unity among residents."

Control

"You aren't going to be able to bring tons of fresh, cooled produce on the back of a bike."

"A big cash pay-out may give me some compensation", or simply, "a parking space near my home."

LLMs received positive ratings

"I found them very useful and understanding, no improvement is needed."
"Gustavo was continually trying to engage, which could be a bit annoying"
"They both suffered from a sycophantic nature..."
"I think they were neutral and didn't sway me in either direction.
"It is aimed at always favoring the project and never really talking about the issues."

Treatment

"Are the newly planted trees well-rooted? Urban trees often fail because their roots don't take."
"Will there be services for leaf pickup and gardening?"

Control

"installation of sun sails", "luxuries like swimming pools".
"removal of parking spaces will destroy local business"; "a 15-min city does not work".

Consultation quality changed

Recall changes the quality of citizens' feedback. From "compensate me for the parking" to "have you thought about leaf pickup".

The platform serves as a diagnostic before publication. Inquiries revealed that our informational material omitted project costs and timelines.
Make engagement adaptive: Some citizens prefer fewer, sharper questions.
Conversational AI is not a substitute for face-to-face deliberation in entrenched cases.
Keep both personas, but let citizens choose.
Anchor before deliberating. The dialogue improves only when the encoding does. Voice-led 360° walkthroughs are a scalable alternative.

Conclusion

sustainableswitzerland.ch/artikel

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Who got to express their preferences? Who did not?

Whose values are we prioritizing?

Who is being harmed?

Who is being helped?

human centered

carina.hausladen@uni-konstanz.de

slides.com/carinah

github.com/carinahausladen

Greenwald, McGhee & Schwartz 1998

The Implicit Association Test

Flagship alignment documents

The corpus

Anthropic
- Constitutional AI: Harmlessness from AI Feedback (2022)
- Sleeper Agents: Training Deceptively Aligned LLMs (2024)
- Collective Constitutional AI (2023)
OpenAI
- Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (2022)
- Our Approach to Alignment Research (2023)
- Rule-Based Rewards and Safety Policies (2023)

DeepMind
- Scalable Agent Alignment via Reward Modeling (2018)
- Learning to Summarize with Human Feedback (2020)
Meta
- MetaAligner: Multi-Objective Alignment (2026)
- Beyond Reward Hacking: Causal Rewards (2025)
Microsoft
- Controllable Safety Alignment (CoSA) (2025)
- Safer Pretraining of Language Models (2023)

Swiss urban planning has a long-standing tradition of offline participation.

However, online participation is not yet very advanced.

Instead, online participation is driven by third parties whose primary goal is profit maximization, rather than democratic dialogue.

What to ask?

Revealed preferences
fail to serve
individual or
societal well-being.

(Kleinberg 2022, Tasioulas 2022)

The questions a planning department would like to be asked

"Are the newly planted trees well-rooted? Urban trees often fail because their roots don't take."
"Will there be services for leaf pickup and gardening?"
Specific, granular and locally rooted — exactly the questions a planning department would like to be asked, and exactly the questions the standard sixty-page dossier rarely answers.

UC Berkeley graduate admissions, fall 1973

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

Carina I Hausladen

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

Who got to express their preferences? Who did not?

Whose values are we prioritizing?

Who is being harmed?

Who is being helped?

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

How do we measure bias in the social sciences?

Fiske, Cuddy, Glick & Xu 2002

Lakisha La

Emily Em

The audit-study tradition

DeGraffenreid v. General Motors (1976)

Intersectionality

Social perception of faces in a vision-language model

Carina I. Hausladen, Manuel Knott, Colin F. Camerer, Pietro Perona

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

Social Choice Theory

How does Social Choice Theory connect to AI Alignment?

Arrow's impossibility theorem

There is no aggregation rule that is neutral.

Reinforcement Learning from Human Feedback

Whose disagreement counts more?

Which trade-offs are acceptable?

Alignment documents as data

Alignment Strategies of Five Major AI Companies and Their Consequences

Carina I Hausladen

The corpus

Five normative lenses

Aligment Intuition

SWF Mapping

The PRISM Alignment dataset

SWF Mapping

Welfare trade-offs across alignment criteria

What changes once this distribution is taken seriously?

The alignment frameworks I examine do not incorporate real user preferences.

A single-model solution is insufficient

A single-model solution is insufficient

A single-model solution is insufficient

Performance varies across substantive domains

Discussion

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

How well does this work?

LLM Voting: Human Choices and AI Collective Decision-Making

Joshua C. Yang, Damian Dailisan, Marcin Korecki, Carina I. Hausladen, Dirk Helbing

How are votes distributed across projects?

LLMs are sensitive to list orderings.

Each model has its own preferences

Persona aligns topics & erases other-regarding behaviour

The digital-twin frame has problems

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

Carina I Hausladen, Javier Argota Sánchez-Vaquerizo, Michael Siebenmann, Arthur Capozzi, Sachit Mahajan, Dirk Helbing

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

Beyond the Townhall: Spatial Anchoring and LLM Agents for Scalable Participatory Urban Planning

Spatial anchoring

Dual coding

Event Segmentation

The walkthrough group remembered more

Conversations changed

Treatment

Control

LLMs received positive ratings

Treatment

Control

Consultation quality changed

Conclusion

Alignment for Whom? Bias, Welfare, and Collective Choice in AI

Who got to express their preferences? Who did not?

Whose values are we prioritizing?

Who is being harmed?

Who is being helped?

The Implicit Association Test

The corpus

Swiss urban planning has a long-standing tradition of offline participation.

However, online participation is not yet very advanced.

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Joshua C. Yang, Damian Dailisan, Marcin Korecki,
Carina I. Hausladen, Dirk Helbing

Persona aligns topics
& erases other-regarding behaviour

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Alignment for Whom?
Bias, Welfare, and Collective Choice in AI

Revealed preferences
fail to serve
individual or
societal well-being.