Differential Privacy is kind of like color mixing on hard mode!
Standard data encryption offers little to no protection. Like the color purple, you can easily decipher its constituents (sensitive information) to be red and blue.
Problems with other standard
approaches of data privacy protection
advanced computer algorithms can decrypt easily
example: a data set includes gender, residence, age, DOB, malaria(?)
---> deduce who the data belongs to
---> sensitive data is leaked
^^ linkage attack
analysts can ask specific questions to the trusted "curator" to identify who the data belongs to
We (the data seekers) can still obtain the information that it's green, but much harder to find its exact components
This is differential privacy:
Imagine your raw data is green, but differential privacy encodes it to an ambiguous green like this.
To achieve differential privacy, we have to introduce randomness. The amount of randomness depends on:
1. sensitivity of the query (global sensitivity)
2. desired level of privacy
(privacy-utility trade-off)
= randomness
(aka noise)
low sensitivity
Query: is it still a mix of colors?
yes
yes
high sensitivity
Query: a mix of how many colors?
3
2
difference of one data entry heavily impacts function output = high sensitivity
difference of one data entry negligibly impacts function output = low sensitivity
If you want accuracy, there is less privacy
Easier to infer blue and yellow from green
If you want privacy, there is less accuracy
harder to infer blue and yellow from green
original
synthetic