Carina I. Hausladen, Manuel Knott, Colin F. Camerer, Pietro Perona

Social perception of faces in a CLIP vision-language model

"Make it More" Trend

by @kevinschawinski

Ok, make it more Swiss

MORE SWISS

Moooorrreeee Swisssss!!

Bloomberg, March 8th, 2024

Humans spontaneously make social judgments from the photographs of faces.

(Oosterhof 2008, Sutherland 2018, Todorov 2017)

Here, we investigate whether vision-language models can also do so.

Understanding socially relevant behaviors of AI systems is necessary to use them responsibly.

We measure social perception

of human faces

in a vision-language model.

CLIP is a state-of-the-art vision-language model that connects images and text.
It is used for tasks like image recognition, captioning,
and powering applications such as DALL·E.

A photo of a

person

A photo of a

person

A photo of a
person

Measuring Social Perception via

Cosine Similarity

FairFace
Karkkainen et al. (2021)

UTKFace
Zhifei et al. (2017)

FairFace
Karkkainen et al. (2021)

UTKFace
Zhifei et al. (2017)

CausalFace
Liang et al. (2023)

age

CausalFace

female

male

age

CausalFace

female

male

age

Asian

Black

White

Legally Protected

Legally Protected

Non-protected

smiling

lighting

pose

A photo of a
person

A photo of a

person

Stereotype Content Model

Fiske et al. (2007)

Agency Belief Communion Model

Koch et al. (2016)

Warmth

Competence

unfriendly

friendly

Agency

Belief

Communion

+

–

C

P

–

+

surgeon

parent

A photo of a

person

Stereotype Content Model

Fiske et al. (2007)

Agency Belief Communion Model

Koch et al. (2016)

Warmth

Competence

Agency

Belief

Communion

+

–

C

P

–

+

parent

unfriendly

friendly

surgeon

We deploy a new "causal" dataset.

1.

We use theories of social perception to generate prompts.

2.

We measure cosine similarity of trait words and face features.

3.

FairFace

UTKFace

CausalFace

How does CausalFace compare to wild-collected datasets?

Markedness (Wolfe and Caliskan, 2022)
WEAT (Caliskan et al., 2017)
Skew@k, NDKL (Geyik et al., 2019)
Mean cosine similarities

Commonly used bias-metrics

Markedness

a photo of a 

person

a photo of a
 WHITE 
person

unmarked

marked

image category	CausalFace
white	45.5

black	0.7
asian	0.1
male	0.4
female	0.6

Fair Face	UTK Face
47.09	32.6

1.8	2.9
1.9	4.1
0.00	20.1
0.00	11.6

>

%

Markedness (Wolfe and Caliskan, 2022)
WEAT (Caliskan et al., 2017)
Skew@k, NDKL (Geyik et al., 2019)
Mean cosine similarities

✓

Commonly used bias-metrics

Legally "Protected" Attributes

female

male

age

Asian

Black

White

Non-protected Attributes

smiling

lighting

pose

How do

protected and

non-protected

attributes affect social perception?

smiling

Bootstrapping Differences

smiling

—

Bootstrapping Differences

smiling

—

protected and non-protected attributes

—

How does

age-related

social perception compare across datasets?

Warmth

Competence

Belief

Communion

–

+

Agency

–

+

Agency

UTKFace

💼 Powerful

👑 High status

🦁 Dominating

💰 Wealthy

💪 Confident

🏆 Competitive

🍂 Powerless

📉 Low-status

🌾 Dominated

🪙 Poor

🐭 Meek

🍂 Passive

UTKFace

Agency

FairFace

Agency

CausalFace

UTKFace

FairFace

CausalFace

–

+

youngest

oldest

Agency

+

Positive Agency

Black Women

youngest

oldest

example

identity

The observation that Black women are a special category in the social perception of age is consistent with human subject research.
'Strong Black Woman ideal' is reinforced with age (Baker 2015).

age

?

smiling

female

male

Asian

Black

White

smiling

Smiling

NegativeAgency

Conservative Belief

Negative Communion

Smiling

Positive Agency

Progressive Belief

Positive Communion

Warmth

Competence

Warmth

most frowning

most smiling

sample

identity

Black Women

most frowning

most smiling

Conservative Belief

Limitations

Attribute Manipulation Effectiveness
- Effects of lighting, facial expressions etc. might differ across demographic groups. .
Residual Confounds?
- Some color confounds might still be present despite controls for background, clothing, and hair color.
Dataset vs. Model Bias
- Three datasets. But only one CLIP model. No comparison to human ratings.

The impact of protected and non-protected characteristics is comparable in size.

Social Perception of age show six clustered race-gender groups in CausalFace.

Strongly diverging age effects for Black Women.

Strong impact of smiling of Black Women on positive social perception.

carinah@ethz.ch

slides.com/carinah

https://github.com/carinahausladen

Appendix

Smiling

Smiling

Word Embedding Association Test (WEAT)

Caliskan et al. (2017)

pooled sd

asian black

photo of a warm person

photo of a warm person

asian black

—

WEAT

Kruskal-Wallis $\chi^2$ = 1.6,
p-value = 0.4

protected and non-protected attributes

–

+

Theoretical Models

Statistical discrimination (Arrow, 1998)

Unfair treatment of ethnic minorities can result from rational actions executed by profit-maximizing actors who are confronted with the uncertainties accompanying selection decisions.

Taste-based discrimination (Becker, 2010)

Discriminatory behavior is the result of people’s unfavorable attitudes toward ethnic minorities.

Prompt templates

A photo of a <attribute> person.
A <attribute> person.
This is a <attribute> person.
Cropped face photo of a <attribute> person.

Bootstrapping Variations

We randomly choose two distinct values, $x_1,x_2 \sim X$, for the chosen dimension (e.g., white and black).
For each pair of values, we select the respective image embeddings, $i_1(x=x_1), i_2(x=x_2)$ that are equal in all other dimensions (in this example: gender, age, smiling, lighting, and pose).
We then compute the difference in cosine similarities between each image embedding and a text embedding $t$, defined as $\Delta(t, i_1, i_2) = \lvert \cos(i_1, t) - \cos(i_2, t) \rvert$.
This process is repeated 1,000 times, generating a bootstrap distribution of $ \Delta $ values.
This distribution describes the impact of the specific dimension on the cosine similarity of image embeddings and text embedding.

Heatmap of Pearson correlation coefficients of positive and negative valence dimensions of the ABC model.

How does Facial Expression impact Social Perception?

Smiling

a photo of a person

Smiling

a photo of a person

a photo of a person

Smiling

a photo of a person

a photo of a person

Smiling

a photo of a person

Smiling

a photo of a
liberal
person

Belief (progressive)

Smiling

Belief (progressive)

Agency +

Communion +

Warmth

Competence

$$\Delta$$ Cosine Similarity %

Progressive Belief

Gender

Females

Males

Race

Asian

Black

White

Black Women

🔬 Science-Oriented

🔄 Alternative

🕊️ Liberal

📱Modern

How does CausalFace compare to wild-collected datasets w.r.t. gender and race?

FairFace

UTKFace

CausalFace

Commonly used bias-metrics

Markedness

a photo of a 

person

a photo of a
 WHITE 
person

unmarked

marked

image category	CausalFace
white	45.50

black	0.68
asian	0.05
male	0.42
female	0.64

Fair Face	UTK Face
47.09	32.6

1.88	2.9
1.85	4.1
0.00	20.1
0.00	11.6

>

%

✓

Social perception of faces in a CLIP vision-language model

"Make it More" Trend

Ok, make it more Swiss

MORE SWISS

Moooorrreeee Swisssss!!

Humans spontaneously make social judgments from the photographs of faces.

Here, we investigate whether vision-language models can also do so.

Measuring Social Perception via

Cosine Similarity

FairFace Karkkainen et al. (2021)

UTKFace Zhifei et al. (2017)

FairFace Karkkainen et al. (2021)

UTKFace Zhifei et al. (2017)

CausalFace Liang et al. (2023)

CausalFace

CausalFace

CausalFace

Legally Protected

Legally Protected

Non-protected

1.

2.

3.

FairFace

UTKFace

CausalFace

How does CausalFace compare to wild-collected datasets?

Commonly used bias-metrics

Markedness

Commonly used bias-metrics

Legally "Protected" Attributes

Non-protected Attributes

How do

protected and

non-protected

attributes affect social perception?

Bootstrapping Differences

Bootstrapping Differences

Bootstrapping Differences

How does

age-related

social perception compare across datasets?

?

Smiling

Smiling

Warmth

Limitations

Appendix

Smiling

Smiling

Word Embedding Association Test (WEAT)

Theoretical Models

Prompt templates

Bootstrapping Variations

How does Facial Expression impact Social Perception?

Smiling

Smiling

Smiling

Smiling

Smiling

Smiling

Smiling

How does CausalFace compare to wild-collected datasets w.r.t. gender and race?

FairFace

UTKFace

CausalFace

Commonly used bias-metrics

Markedness

Commonly used bias-metrics

COLIN CAMERER Fed/Booth talk

More from Carina Ines Hausladen

FairFace
Karkkainen et al. (2021)

UTKFace
Zhifei et al. (2017)

FairFace
Karkkainen et al. (2021)

UTKFace
Zhifei et al. (2017)

CausalFace
Liang et al. (2023)