CombΘ Seminar
for Lower Bounds
Differential Privacy
An Introduction to
Fingerprinting Techniques
and
Victor Sanches Portella
November, 2024
cs.ubc.ca/~victorsp
Who am I?
Postdoc supervised by prof. Yoshiharu Kohayakawa
Interests
ML Theory
Optimization
Randomized Algs
Optimization
Privacy? Why, What, and How
What do we mean by "privacy" in this case?
Informal Goal: Output should not reveal (too much) about any single individual
Not considering protections against security breaches
Output
Data Analysis
Output should have information about the population
This has more to do with "confidentiality" than "privacy"
Real-life example - NY Taxi Dataset
Summary: License plates were anonymized using MD5
Easy to de-anonymize due to lincense plate structure
By Vijay Pandurangan
https://www.vijayp.ca/articles/blog/2014-06-21_on-taxis-and-rainbows--f6bc289679a1.html
Real-life example - Netflix Dataset
Take Away from Examples
Privacy is quite delicate to get right
Hard to take into account side information
"Anonymization" is hard to define and implement properly
Different use cases require different levels of protection
Differential Privacy
Output 1
Output 2
Indistinguishible
Differential Privacy
Anything learned with an individual in the dataset
can (likely) be learned without
\(\mathcal{M}\) needs to be randomized to satisfy DP
Adversary with full information of all but one individual can infer membership
Differential Privacy (Formally)
Any pair of neighboring datasets: they differ in one entry
\(\mathcal{M}\) is \((\varepsilon, \delta)\)-Differentially Private if
Definition:
\((\varepsilon, \delta)\)-DP
\(\varepsilon \equiv \) "Privacy leakage", in theory constant \(\leq 1\)
\(\delta \equiv \) "Chance of failure", usually \(o(1/|X|)\)
The Advantages of Differential Privacy
Worst case: No assumptions on the adversary
Immune to post-processing: Any computation on the output can only improve the privacy guarantees
Composable: DP guarantees of different algorithms compose nicely, even if done in sequence and adaptively
DP and Other Areas of ML and TCS
Online Learning
Adaptive Data Analysis and Generalization in ML
Robust statistics
Proof uses Ramsey's Theory :)
An Example: Computing the Mean
Goal:
is small
\((\varepsilon, \delta)\)-DP such that approximates the mean:
Algorithm:
Gaussian or Laplace noise
with
An Example: Computing the Mean
Goal:
is small
\((\varepsilon, \delta)\)-DP such that approximates the mean:
Algorithm:
Gaussian or Laplace noise
with
OPTIMAL?
Theorem
\(Z \sim \mathcal{N}(0, \sigma^2 I)\) with
\(\mathcal{M}\) is \((\varepsilon, \delta)\)-DP and
Fingerprinting Codes
A Lower Bound Strategy
Assume \(\mathcal{M}\) is
accurate
Adversary can detect some \(x_i\)
with high probability
Feed to \(\mathcal{M}\) a marked input \(X\)
\((\varepsilon,\delta)\)-DP implies adversary detects \(x_i\) on \(\mathcal{M}(X')\) with
\(X' = X - \{x_i\} + \{z\}\)
CONTRADICTION
Avoiding Pirated Movies via Fingerprinting
Movie may leak!
Movie Owner
Can we detect one ?
?
Idea: Mark some of the scenes (Fingerprinting)
Fingerprinting Codes
\(d\) scenes
\(n\) copies of the movie
1 = marked scene
0 = unmarked scene
Code usually randomized
We can do with \(d = 2^n\). Can \(d\) be smaller?
Example of pirating:
\(0\) or \(1\)
\(0\) or \(1\)
Only 1
Goal of fingerprinting
Given a copy of the movie, trace back one
with probability of
false positive \(o(1/n)\)
Fingerprinting Codes for Lower Bounds
Assume \(\mathcal{M}\) is
accurate
Adversary can detect some \(x_i\)
with high probability
Feed to \(\mathcal{M}\) a marked input \(X\)
\((\varepsilon,\delta)\)-DP implies adversary detects \(x_i\) on \(\mathcal{M}(X')\) with
\(X' = X - \{x_i\} + \{z\}\)
CONTRADICTION
FP codes with \(d = \tilde{O}(n^2)\)
Output -> Pirated Movie
Breaks False Positive Guarantee
[Tardos '08]
The Good, The Bad, and The Ugly of Codes
The Ugly:
Black-box use of FP codes makes it hard to adapt it to
other settings
The Bad:
Very restricted to binary inputs
The Good:
Leads to optimal lower bounds for a variety of problems
Fingerprinting Lemmas
Fingerprinting Lemmas
Idea: For some distribution on the input,
the output is highly correlated with the input
Lemma (A 1D Fingerprinting Lemma, [Bun, Stein, Ullman '16])
\(\mathcal{M} \colon [-1,1]^n \to [-1,1]\)
\(p \sim \mathrm{Unif}(\{-1,1\})\)
\(x_1, \dotsc, x_n \in \{\pm 1\}\) random such that \(\mathbb{E}[x_i] = p\)
"Correlation" between \(x_i\) and \(\mathcal{M}(X)\)
\(\mathcal{A}(x_i, \mathcal{M}(X))\)
Fingerprinting Lemma - Picture
If \(\mathcal{M}\) is accurate
large
If \(z\) indep. of \(X\)
small
Depends on distribution of \(X\) and \(p\)
From 1D Lemma to a Code(-Like) Object
Fingerprinting Lemma leads to a kind of fingerprinting code
Bonus: quite transparent and easy to describe
Key Idea: Make \(\tilde{O}(n^2)\) independent copies
\(\mathcal{M} \colon ([-1,1]^d)^n \to [-1,1]\)
\(p \sim \mathrm{Unif}(\{-1,1\})^d\)
\(x_1, \dotsc, x_n \in \{\pm 1\}^d\) random such that \(\mathbb{E}[x_i] = p\)
for \(d = \Omega(n^2 \log n)\)
\(\mathcal{A}(x_i, \mathcal{M}(X))\)
From Lemma to Lower Bounds
If \(\mathbb{E}(\lVert \mathcal{M}(X) - p\rVert_2^2) \leq d/6\)
If \(\mathcal{M}\) is accurate, correlation is high
If \(\mathcal{M}\) is \((\varepsilon, \delta)\)-DP, correlation is low
\(\mathcal{A}(x_i, \mathcal{M}(X))\)
Extension to Gaussian Case
Lemma (Gaussian Fingerprinting Lemma)
\(\mathcal{M}\colon \mathbb{R}^n \to \mathbb{R}\)
\(\mu \sim \mathcal{N}(0, 1/2)\)
\(x_1, \dotsc, x_n \sim \mathcal{N}(\mu,1)\)
One advantage of lemmas over codes:
Easier to extend to different settings
Implies similar lower bounds for privately estimating the mean of a Gaussian
Lower Bounds for Gaussian
Covariance Matrix Estimation
Work done in collaboration with Nick Harvey
Privately Estimating a Covariance Matrix
Unknown Covariance Matrix
\((\varepsilon, \delta)\)-differentially private \(\mathcal{M}\) to estimate \(\Sigma\)
on \(\mathbb{R}^d\)
Goal:
Required even without privacy
Required even for \(d = 1\)
Is this tight?
Exists \((\varepsilon, \delta)\)-DP \(\mathcal{M}\) such that
samples
Known algorithmic results
with
Roadblocks to Fingerprinting Lemmas
Unknown Covariance Matrix
on \(\mathbb{R}^d\)
To get a Fingerprinting Lemma, we need random \(\Sigma\)
Most FPLs are \(d = 1\), and then use independent copies
leads to limited lower bounds for covariance estimation
[Kamath, Mouzakis, Singhal '22]
We can use diagonally dominant matrices, but
0 has error \(O(1)\)
Can't lower bound accuracy of algorithms with \(\omega(1)\) error
Diagonal
Off-diagonal
Our Results
Theorem
For any \((\varepsilon, \delta)\)-DP algorithm \(\mathcal{M}\) such that
and
we have
Our results covers both regimes
Nearly highest reasonable value
[Kamath et al. 22]
Previous lower bounds required
[Narayanan 23]
OR
Main Contribution: Fingerprinting Lemma without independence
Which Distribution to Use?
Wishart Distribution
Our results use a very natural distribution:
\(d \times 2d\) random Gaussian matrix
Natural distribution over PSD matrices
Entries are highly correlated
A Different Correlation Statistic
A Peek Into the Proof for 1D
Lemma (Gaussian Fingerprinting Lemma)
\(\mu \sim \mathcal{N}(0, 1/2)\)
\(x_1, \dotsc, x_n \sim \mathcal{N}(\mu,1)\)
Claim 1
Claim 2
Stein's Lemma
Follows from integration by parts
A Peek Into the Proof of New FP Lemma
Fingerprinting Lemma
Need to Lower Bound
\(\Sigma \sim\) Wishart leads to elegant analysis
Stein-Haff Identity
"Move the derivative" from \(g\) to \(p\) with integration by parts
Stokes' Theorem
Takeaways
Differential Privacy is a mathematically formal definition of private algorithms
Interesting connections to other areas of theory
Fingerprinting Codes lead to many optimal lower bounds for DP
Fingerprinting Lemmas are more versatile for lower bounds and can be adapted to other settings
New Fingerprinting Lemma escaping the need to bootstrap a 1D result to higher dimensions with independent copies
Thanks!
CombΘ Seminar
for Lower Bounds
Differential Privacy
An Introduction to
Fingerprinting Techniques
and
Victor Sanches Portella
November, 2024
cs.ubc.ca/~victorsp
Seminário CombO 1 Nov
By Victor Sanches Portella
Seminário CombO 1 Nov
- 23