https://xkcd.com/1269/
Take a moment to write on your sheet:
Text
"Sampling ideas, viewpoints, and aesthetics without being unduly judged by or associated with them are part of learning, maturing, becoming individuals, figuring out the world on our own terms"
Neil Richards (law professor)
Privacy is "the right to be left alone"
Warren and Brandeis, Harvard Law Review, 1890
Universal Declaration of Human Rights
"All human beings have three lives: public, private, and secret"
Gabriel Garcia Marguez (literary author)
"Just because something is publicly accessible does not mean that people want it to be publicized"
danah boyd (technology & social media scholar)
"Privacy is what allows us to determine who we are and who we want to be."
Edward Snowden, NSA whistleblower
US Tort Law: Freedome from "intrusion of solitude, public disclosure of private facts, false light, and appropriation"
William Prosser, California Law Review, 1960
"Privacy is the ability to control one's reputation"
Siva Vaidhyanathan, cultural historian & media scholar
Privacy is strangely hard to define
Image of “Reporters with various forms of "fake news" from an 1894 illustration by Frederick Burr Opper.”
https://commons.wikimedia.org/wiki/File:The_fin_de_si%C3%A8cle_newspaper_proprietor_(cropped).jpg
Image generated by DALLE-2, which cannot spell politics
Privacy threats come in several categories:
https://plato.stanford.edu/entries/it-privacy/
In law, personal data is defined as data that can be linked with a natural person
Surveillance capitalism: “An economic system built on the secret extraction and manipulation of human data” – Shoshana Zuboff
Preach is an acronym that consumers and professionals can use to assess the maturity and effectiveness of a privacy program.
2000s: Insurance company collected patient data for ~135,000 state employees.
In 2006, Netflix released an anonymized dataset containing movie ratings from approximately 500,000,part of a competition to improve its recommendation algorithm.
AOL put de-identified/anonymized Internet search data (including health-related searches) on its web site. New York Times reporters were able to re-identify an individual from her search records within a few days (Porter, 2008).
specifies 18 data elements that must be removed or generalized in a data set in order for it to be considered “de- identified.” The HIPAA Safe Harbor data elements (aka direct identifiers) include the following:
1. Names
2. Zip codes (except first three)
3. All elements of dates (except year)
4. Telephone numbers
5. Fax numbers
6. Electronic mail addresses
7. Social security numbers
8. Medical record numbers
9. Health plan beneficiary numbers
10. Account numbers
11. Certificate or license Numbers
12. Vehicle identifiers and serial numbers, including license plate numbers
13. Device identifiers and serial numbers
14. Web Universal Resource Locators (URLs)
15. Internet Protocol (IP) address numbers
16. Biometric identifiers, including finger and voice prints
17. Full face photographic images and any comparable images
18. Any other unique identifying number, characteristic or code
In 1997, using a known birth date, gender and zip code, a computer expert was able to identify
the records of Gov. William Weld from an allegedly anonymous database of Massachusetts
state employee health insurance claims (Barth-Jones, 2015).
• In 2007, with information available on the Internet, Texas researchers utilizing a deanonymization methodology were able to re-identify individual customers from a database of
500,000 Netflix subscribers (Narayanan, 2008).
• In 2013, Science (Gymrek, McGuire, Golan, Halperin, & Erlich, 2013) reported the successful
efforts of researchers in identifying “deindentified” male genomes through correlations with
commercial genealogy databases.
• Students were able to re-identify a significant percentage of individuals in the Chicago
homicide database by linking with the social security death index (K. El Emam & Dankar, 2008).
• AOL put de-identified/anonymized Internet search data (including health-related searches) on
its web site. New York Times reporters were able to re-identify an individual from her search
records within a few days (Porter, 2008).
Measuring the “Identifiability” of Data
De-identification
Anonymization
Note another category of attributes: sensitive attributes. Examples include medical records, salaries, etc. Usually a value we want to predict or group by.
87% (216M of 248M) of the US population is uniquely identifiable based only on:
can do re-identification attack by linking quasi-identifiers with external information e.g. medical data, voter registration
quasi-identifiers: Attributes that in combination can uniquely identify individuals
Sweeney. K-anonymity: A model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557–570, 2002
Methods to achieve it:
2-anonymous new data
2-anonymous new data
Because the dataset is now 2-anonymous, we're not sure what Andre's medical problem is, because there are other people that this external data could correspond to
He's been hidden in the crowd!
Your turn!
make this table 2-anonymous
https://www.sciencedirect.com/science/article/pii/S0167404821003126
Your turn!
Tip: sometimes it can help to draw hierarchies of values to figure out how to generalize them, or if you need to suppress them all together
here's one for zip codes starting with 3
https://www.sciencedirect.com/science/article/pii/S0167404821003126
original zips
level 0
level 1
level 2
Here's one for age
Your turn!
make this table 2-anonymous
https://www.sciencedirect.com/science/article/pii/S0167404821003126
Exercise
make this table 2-anonymous
we can generalize Zip code and Age:
Zip Code: Group into broader regions based on the first three digits:
Age: Group into age ranges:
Some options, but there are many more!
3 common attacks
Other methods than k-anonymity
#1 Privacy is Dead
resignation: the acceptance of something undesirable but inevitable.
#2 (Young) People Don't Care about Privacy
How It Emerged
Youth & Social Media Usage: Perception that younger generations freely share personal info online
Surveys & Studies: Misinterpretation of data suggesting apathy towards privacy
Corporate Narratives: Companies downplay privacy concerns to justify data collection practices
Why It Persists:
Complex Privacy Settings: Users overwhelmed by complicated privacy management
Behavior vs. Attitude Gap: Actions (sharing online) misinterpreted as indifference, despite underlying concerns, called the "privacy paradox"
Normalization: Constant exposure to data sharing norms reduces perceived importance of privacy.
privacy paradox: where people express concern about privacy but still share personal information.
#2 (Young) People Don't Care about Privacy
#3 Nothing to Hide, Nothing to Fear
#4 Privacy is Bad for Business
resignation: the acceptance of something undesirable but inevitable.
In your group, read through an article about how automatically monitored data is being collected and modeled
https://guides.libraries.psu.edu/berks/privacy#s-lg-box-19510455
Report back on the positive and negative impacts of these practices on individuals and society.
Micro-lecture (10-15 minutes)
“Falsehood flies, and truth comes limping after it, so that when men come to be undeceived, it is too late; the jest is over, and the tale hath had its effect.”
Image of “Reporters with various forms of "fake news" from an 1894 illustration by Frederick Burr Opper.”
https://commons.wikimedia.org/wiki/File:The_fin_de_si%C3%A8cle_newspaper_proprietor_(cropped).jpg
Micro-lecture (10-15 minutes)
“Falsehood flies, and truth comes limping after it, so that when men come to be undeceived, it is too late; the jest is over, and the tale hath had its effect.”
Image of “Reporters with various forms of "fake news" from an 1894 illustration by Frederick Burr Opper.”
https://commons.wikimedia.org/wiki/File:The_fin_de_si%C3%A8cle_newspaper_proprietor_(cropped).jpg
Micro-lecture (10-15 minutes)
“Falsehood flies, and truth comes limping after it, so that when men come to be undeceived, it is too late; the jest is over, and the tale hath had its effect.”
Image of “Reporters with various forms of "fake news" from an 1894 illustration by Frederick Burr Opper.”
https://commons.wikimedia.org/wiki/File:The_fin_de_si%C3%A8cle_newspaper_proprietor_(cropped).jpg
Market
Design
Privacy by Design means:
Law
Norms
activity: do this quiz and take notes on what the patterns are called and their descriptions
Text
Where have you left data tracks today?
What data do you think is collected about you regularly?
What apps do you use daily? Weekly?
What steps do you already take to protect your data?
What does privacy mean to you?
Benefits of large-scale data
“Because the commoditization of consumer data isn’t likely to end anytime soon, it’s up to the businesses that gather and profit from this data to engage directly with their customers and establish data protections they will trust.”[1]
Text
How much does consent matter, vs. consequences?