Racial Bias in Gender Recognition: Facial recognition technology works poorly on sub-populations who are poorly represented in training and evaluation data (Buolamwini & Gebru)
Gender Stereotypes in Word Embeddings:
Miscalibration in Recommendation: Designing recommendations which optimize engagement leads to over-recommending the most prevalent types (Steck, 2018)
rom-com 80% of the time
horror 20% of the time
optimize for probable click
$$\mathbb P(\mathsf{click})= \mathbb P(\mathsf{click}\mid \mathsf{romcom}) \mathbb P(\mathsf{romcom}) + \mathbb P(\mathsf{click}\mid \mathsf{horror}) \mathbb P(\mathsf{horror})$$
\((1-\mathbb P(\mathsf{romcom}))\)
rom-com 80% of the time
horror 20% of the time
optimize for probable click
recommend rom-com 100% of the time
$$\max_{0\leq p\leq 1}0.8\times p + 0.2 \times (1-p)$$
Miscalibration in Recommendation: Designing recommendations which optimize engagement leads to over-recommending the most prevalent types (Steck, 2018)
Gender Bias in Translation: Data-driven machine translation perpetuates gender bias.
(screenshot from fall 2023)
Machine translation works by maximizing the probability of an English sentence given Hungarian sentence
Spring 2026 update:
Bias (regardless of source) is amplified when ML is deployed
Risk of re-identification from "anonymized" data
Now, personal browsing information more rarely released for public research
...but companies still collect, use, and sell it outside public view
themarkup.org
An old idea about ensuring privacy from survey studies...
Is re-identification possible when only a model, not a dataset, is released?
Making re-identification unlikely with differential privacy
...but Google was prevented from allowing access to the full text of the books themselves.
(even out of print books that are otherwise impossible to access)
generative AI pushed "fair use" to its limits
If history is any indicator…