CS 194/294 Guest Lecture, 3/1/21
Bias: prejudice, usually in a way considered to be unfair
(mathematical) systematic distortion of a statistical result
Prejudice: preconceived opinion
(legal) harm or injury resulting from some action or judgment
Algorithm: a set of rules to be followed, especially by a computer
Promises of algorithmic screening
"I usually look for candidates who will be a good fit"
Promises of algorithmic data-driven screening
Hand-coded algorithms are hard!
Machine learning extracts statistical patterns from employment records to automatically generate accurate and optimal classifications
Federal laws make it illegal to discriminate on the basis of race, color, religion, sex, national origin, pregnancy, disability, age, and genetic information.
Important quantities:
features \(X\) and label \(Y\)
(e.g. resume and employment outcome)
classifier \(c(X) = \hat Y\)
(e.g. interview invitation)
Accuracy
\(\mathbb P( \hat Y = Y)\) = ________
Positive rate
\(\mathbb P( \hat Y = 1)\) = ________
False positive rate
\(\mathbb P( \hat Y = 1\mid Y = 0)\) = ________
False negative rate
\(\mathbb P( \hat Y = 0\mid Y = 1)\) = ________
Positive predictive value
\(\mathbb P( Y = 1\mid\hat Y = 1)\) = ________
Negative predictive value
\(\mathbb P( Y = 0\mid\hat Y = 0)\) = ________
\(X\)
\(Y=1\)
\(Y=0\)
\(c(X)\)
\(3/4\)
\(9/20\)
\(1/5\)
\(3/10\)
\(7/9\)
\(8/11\)
features \(X\) and label \(Y\)
(e.g. resume and employment outcome)
classifier \(c(X) = \hat Y\)
(e.g. interview invitation)
individual with protected attribute \(A\)
(e.g. race or gender)
Independence: decision does not depend on \(A\)
\(\hat Y \perp A\)
e.g. applicants are accepted at equal rates across gender
Separation: given outcome, decision does not depend on \(A\)
\(\hat Y \perp A~\mid~Y\)
e.g. qualified applicants are accepted at equal rates across gender
Sufficiency: given decision, outcome does not depend on \(A\)
\( Y \perp A~\mid~\hat Y\)
e.g. accepted applicants are qualified at equal rates across gender
“Black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified. [...] White defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often.”
“In comparison with whites, a slightly lower percentage of blacks were ‘Labeled Higher Risk, But Didn’t Re-Offend.’ [...] A slightly higher percentage of blacks were ‘Labeled Lower Risk, Yet Did Re-Offend.”’
\(\mathbb P(\hat Y = 1\mid Y=0, A=\text{Black})> \mathbb P(\hat Y = 1\mid Y=0, A=\text{White}) \)
\(\mathbb P(\hat Y = 0\mid Y=1, A=\text{Black})< \mathbb P(\hat Y = 0\mid Y=1, A=\text{White}) \)
\(\mathbb P(Y = 0\mid \hat Y=1, A=\text{Black})\approx \mathbb P( Y = 0\mid \hat Y=1, A=\text{White}) \)
\(\mathbb P(Y = 1\mid \hat Y=0, A=\text{Black})\approx \mathbb P( Y = 1\mid \hat Y=0, A=\text{White}) \)
COMPAS risk predictions do not satisfy separation
COMPAS risk predictions do satisfy sufficiency
If we use machine learning to design a classification algorithm, how do we ensure nondiscrimination?
Attempt #1: Remove protected attribute \(A\) from features
Attempt #2: Careful algorithmic calibration
Pre-existing Bias: exists independently, usually prior to the creation of the system, with roots in social institutions, practices, attitudes
Technical Bias: arises from technical constraints and considerations; limitations of formalisms and quantification of the qualitative
Emergent Bias: arises in context of use as a result of changing societal knowledge, population, or cultural values
Classic taxonomy by Friedman & Nissenbaum (1996)
individual \(X\)
training data
\((X_i, Y_i)\)
model
\(c:\mathcal X\to \mathcal Y\)
prediction \(\hat Y\)
measurement
learning
action
training data
\((X_i, Y_i)\)
Existing inequalities can manifest as pre-existing bias
measurement
training data
\((X_i, Y_i)\)
Technical bias may result from the process of constructing a dataset:
measurement
training data
\((X_i, Y_i)\)
model
\(c:\mathcal X\to \mathcal Y\)
Further technical bias results from formulating learning task and training the model: optimization bias
learning
e.g. optimizing for average accuracy will prioritize majority groups
Designing recommendations which optimize engagement leads to over-recommending the most prevalent types (Steck, 2018)
rom-com 80% of the time
horror 20% of the time
optimize for probable click
recommend rom-com 100% of the time
Small Is Beautiful: Economics as if People Mattered, E.F. Schumacher:
Emergent bias results from real-world dynamics, including those induced by the decisions themselves
individual \(X\)
model
\(c:\mathcal X\to \mathcal Y\)
prediction \(\hat Y\)
e.g. Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."
action
Predictive policing models predict crime rate across locations based on previously recorded crimes
PredPol analyzed by Lum & Isaac (2016)
recorded drug arrests
police deployment
estimated actual drug use
image cropping
facial recognition
information retrieval
generative models