CombΘ Seminar

for Lower Bounds

Differential Privacy

An Introduction to

Fingerprinting Techniques

and

Victor Sanches Portella

November, 2024

cs.ubc.ca/~victorsp

Who am I?

Postdoc supervised by prof. Yoshiharu Kohayakawa

Interests

ML Theory

Optimization

Randomized Algs

Optimization

p_t

1 - p_t

Privacy? Why, What, and How

What do we mean by "privacy" in this case?

Informal Goal: Output should not reveal (too much) about any single individual

Not considering protections against security breaches

Output

Data Analysis

Output should have information about the population

This has more to do with "confidentiality" than "privacy"

Real-life example - NY Taxi Dataset

Summary: License plates were anonymized using MD5

Easy to de-anonymize due to lincense plate structure

By Vijay Pandurangan
https://www.vijayp.ca/articles/blog/2014-06-21_on-taxis-and-rainbows--f6bc289679a1.html

Real-life example - Netflix Dataset

Take Away from Examples

Privacy is quite delicate to get right

Hard to take into account side information

"Anonymization" is hard to define and implement properly

Different use cases require different levels of protection

Differential Privacy

\displaystyle \mathcal{M}

Output 1

Output 2

Indistinguishible

Differential Privacy

Anything learned with an individual in the dataset

can (likely) be learned without

\(\mathcal{M}\) needs to be randomized to satisfy DP

Adversary with full information of all but one individual can infer membership

Differential Privacy (Formally)

Any pair of neighboring datasets: they differ in one entry

\(\mathcal{M}\) is \((\varepsilon, \delta)\)-Differentially Private if

\mathbb{P}(\mathcal{M}(X) \in S) \leq e^{\varepsilon} \cdot \mathbb{P}(\mathcal{M}(X') \in S) + \delta

Definition:

\forall S

\((\varepsilon, \delta)\)-DP

\(\varepsilon \equiv \) "Privacy leakage", in theory constant \(\leq 1\)

\(\delta \equiv \) "Chance of failure", usually \(o(1/|X|)\)

\displaystyle \Bigg \{

The Advantages of Differential Privacy

Worst case: No assumptions on the adversary

Immune to post-processing: Any computation on the output can only improve the privacy guarantees

Composable: DP guarantees of different algorithms compose nicely, even if done in sequence and adaptively

DP and Other Areas of ML and TCS

Online Learning

Adaptive Data Analysis and Generalization in ML

Robust statistics

Proof uses Ramsey's Theory :)

An Example: Computing the Mean

Goal:

is small

\mathcal{M}

\((\varepsilon, \delta)\)-DP such that approximates the mean:

Algorithm:

\displaystyle \mathcal{M}(x) = \mathrm{Mean}(x) + Z

Gaussian or Laplace noise

X = (x_1, \dotsc, x_n)

x_i \in [-1,1]^d

with

\mathbb{E}\Big[\lVert \mathcal{M}(X) - \mathrm{Mean}(x)\rVert \Big]

An Example: Computing the Mean

\mathbb{E}\Big[\lVert \mathcal{M}(X) - \mathrm{Mean}(x)\rVert \Big]

Goal:

is small

\mathcal{M}

\((\varepsilon, \delta)\)-DP such that approximates the mean:

Algorithm:

\displaystyle \mathcal{M}(x) = \mathrm{Mean}(x) + Z

Gaussian or Laplace noise

X = (x_1, \dotsc, x_n)

x_i \in [-1,1]^d

with

OPTIMAL?

Theorem

\(Z \sim \mathcal{N}(0, \sigma^2 I)\) with

\sigma \approx \frac{d}{n} \frac{\sqrt{\ln(1/\delta)}}{\varepsilon}

\(\mathcal{M}\) is \((\varepsilon, \delta)\)-DP and

\mathbb{E}\Big[\lVert \mathcal{M}(X) - \mathrm{Mean}(x)\rVert_2\Big] \leq \sigma \approx \frac{d}{n} \frac{\sqrt{\ln(1/\delta)}}{\varepsilon}

Fingerprinting Codes

A Lower Bound Strategy

Assume \(\mathcal{M}\) is

accurate

Adversary can detect some \(x_i\)
with high probability

Feed to \(\mathcal{M}\) a marked input \(X\)

\((\varepsilon,\delta)\)-DP implies adversary detects \(x_i\) on \(\mathcal{M}(X')\) with
\(X' = X - \{x_i\} + \{z\}\)

CONTRADICTION

Avoiding Pirated Movies via Fingerprinting

Movie may leak!

Movie Owner

Can we detect one ?

Idea: Mark some of the scenes (Fingerprinting)

Fingerprinting Codes

\begin{pmatrix} 1 & 1 & 0 & \cdots & 1 & 0 \\ 0 & 1 & 1 & \cdots & 0 & 0 \\ 0 & 1 & 1 & \cdots & 0 & 1 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 1 & 1 & 0 & \cdots & 1 & 0 \end{pmatrix}

\(d\) scenes

\(n\) copies of the movie

1 = marked scene

0 = unmarked scene

Code usually randomized

We can do with \(d = 2^n\). Can \(d\) be smaller?

Example of pirating:

\begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix}

\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}

\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}

\(0\) or \(1\)

Only 1

\displaystyle \Bigg (

\displaystyle \Bigg )

Goal of fingerprinting

Given a copy of the movie, trace back one
with probability of
false positive \(o(1/n)\)

Fingerprinting Codes for Lower Bounds

Assume \(\mathcal{M}\) is

accurate

Adversary can detect some \(x_i\)
with high probability

Feed to \(\mathcal{M}\) a marked input \(X\)

\((\varepsilon,\delta)\)-DP implies adversary detects \(x_i\) on \(\mathcal{M}(X')\) with
\(X' = X - \{x_i\} + \{z\}\)

CONTRADICTION

FP codes with \(d = \tilde{O}(n^2)\)

Output -> Pirated Movie

Breaks False Positive Guarantee

[Tardos '08]

The Good, The Bad, and The Ugly of Codes

The Ugly:

Black-box use of FP codes makes it hard to adapt it to

other settings

The Bad:

Very restricted to binary inputs

The Good:

Leads to optimal lower bounds for a variety of problems

Fingerprinting Lemmas

Idea: For some distribution on the input,

the output is highly correlated with the input

Lemma (A 1D Fingerprinting Lemma, [Bun, Stein, Ullman '16])

\(\mathcal{M} \colon [-1,1]^n \to [-1,1]\)

\(p \sim \mathrm{Unif}(\{-1,1\})\)

\(x_1, \dotsc, x_n \in \{\pm 1\}\) random such that \(\mathbb{E}[x_i] = p\)

\displaystyle \mathbb{E}\Big [\sum_{i = 1}^n (\mathcal{M}(X) - p) \cdot (x_i - p) \Big ] \geq \frac{1}{3}

\displaystyle - \mathbb{E} [(\mathcal{M}(X) - p)^2 ]

"Correlation" between \(x_i\) and \(\mathcal{M}(X)\)

\(\mathcal{A}(x_i, \mathcal{M}(X))\)

Fingerprinting Lemma - Picture

\displaystyle \mathbb{E}[\mathcal{A}(x_i, \mathcal{M}(X))]

If \(\mathcal{M}\) is accurate

large

\displaystyle \mathbb{E}[|\mathcal{A}(z, \mathcal{M}(X))|]

If \(z\) indep. of \(X\)

small

\mathcal{A}(z, \mathcal{M}(X)) = (\mathcal{M}(X) - p) \cdot (z - p)

\displaystyle p

\displaystyle \mathcal{M}(X)

\displaystyle z

\displaystyle p

\displaystyle x_3

\displaystyle x_2

\displaystyle x_4

\displaystyle \mathcal{M}(X)

\displaystyle x_1

Depends on distribution of \(X\) and \(p\)

From 1D Lemma to a Code(-Like) Object

Fingerprinting Lemma leads to a kind of fingerprinting code

Bonus: quite transparent and easy to describe

Key Idea: Make \(\tilde{O}(n^2)\) independent copies

\(\mathcal{M} \colon ([-1,1]^d)^n \to [-1,1]\)

\(p \sim \mathrm{Unif}(\{-1,1\})^d\)

\(x_1, \dotsc, x_n \in \{\pm 1\}^d\) random such that \(\mathbb{E}[x_i] = p\)

\displaystyle \mathbb{E}\Big [\sum_{i = 1}^n \langle\mathcal{M}(X) - p, x_i - p\rangle \Big ] \geq d/10

\displaystyle \mathbb{P}\Big [\sum_{i = 1}^n \mathcal{A}(x_i, \mathcal{M}(X)) \leq d/20 \Big ] \leq \frac{1}{n^3}

for \(d = \Omega(n^2 \log n)\)

\(\mathcal{A}(x_i, \mathcal{M}(X))\)

From Lemma to Lower Bounds

\displaystyle \mathbb{E}\Big [\sum_{i = 1}^n \langle \mathcal{M}(X) - p, x_i - p \rangle \Big ] \geq \frac{d}{6}

If \(\mathbb{E}(\lVert \mathcal{M}(X) - p\rVert_2^2) \leq d/6\)

\displaystyle \mathbb{E}\Big [\sum_{i = 1}^n \mathcal{A}(x_i, \mathcal{M}(X)) \Big ] \lesssim n \varepsilon \cdot \sqrt{\mathbb{E}[{\lVert\mathcal{M}(X) - p\rVert_2^2}]} + n d \delta

\displaystyle \mathbb{E}[ \mathcal{A}(x_i, \mathcal{M}(X))] \approx \mathbb{E}[\mathcal{A}(x_i, \mathcal{M}(X_{-i}))]

\displaystyle \implies\frac{d}{n} \lesssim \sqrt{\mathbb{E}[{\lVert\mathcal{M}(X) - p\rVert_2^2}]}

If \(\mathcal{M}\) is accurate, correlation is high

If \(\mathcal{M}\) is \((\varepsilon, \delta)\)-DP, correlation is low

\(\mathcal{A}(x_i, \mathcal{M}(X))\)

Extension to Gaussian Case

Lemma (Gaussian Fingerprinting Lemma)

\(\mathcal{M}\colon \mathbb{R}^n \to \mathbb{R}\)

\(\mu \sim \mathcal{N}(0, 1/2)\)

\(x_1, \dotsc, x_n \sim \mathcal{N}(\mu,1)\)

\displaystyle \mathbb{E}\Big [\sum_{i = 1}^n (\mathcal{M}(X) - \mu) \cdot (x_i - \mu) \Big ] \geq \frac{1}{2}

\displaystyle - \;\mathbb{E} [(\mathcal{M}(X) - \mu)^2 ]

One advantage of lemmas over codes:

Easier to extend to different settings

Implies similar lower bounds for privately estimating the mean of a Gaussian

Lower Bounds for Gaussian

Covariance Matrix Estimation

Work done in collaboration with Nick Harvey

Privately Estimating a Covariance Matrix

\displaystyle x_1, x_2, \dotsc, x_n \sim \mathcal{N}(0, \Sigma)

\displaystyle \Sigma \succ 0

Unknown Covariance Matrix

\displaystyle X \in \mathbb{R}^{d \times n}

\((\varepsilon, \delta)\)-differentially private \(\mathcal{M}\) to estimate \(\Sigma\)

on \(\mathbb{R}^d\)

Goal:

Required even without privacy

Required even for \(d = 1\)

Is this tight?

Exists \((\varepsilon, \delta)\)-DP \(\mathcal{M}\) such that

\displaystyle n = \tilde O\Big(\frac{d^2}{\alpha^2} + \frac{\log(1/\delta)}{\varepsilon} + \frac{d^2}{\alpha \varepsilon}\Big)

\displaystyle \mathbb{E}[\lVert\mathcal{M}(X) - \Sigma\rVert_F^2] \leq \alpha^2

samples

Known algorithmic results

with

Roadblocks to Fingerprinting Lemmas

\displaystyle x_1, x_2, \dotsc, x_n \sim \mathcal{N}(0, \Sigma)

\displaystyle \Sigma \succ 0

Unknown Covariance Matrix

on \(\mathbb{R}^d\)

To get a Fingerprinting Lemma, we need random \(\Sigma\)

Most FPLs are \(d = 1\), and then use independent copies

leads to limited lower bounds for covariance estimation

[Kamath, Mouzakis, Singhal '22]

We can use diagonally dominant matrices, but

0 has error \(O(1)\)

\mathbb{E}[\lVert \mathcal{M}(X) - 0 \rVert_F^2 ] = O(1)

Can't lower bound accuracy of algorithms with \(\omega(1)\) error

Diagonal

= \frac{3}{4} \pm \frac{1}{4d}

Off-diagonal

= \pm \frac{1}{2d}

Our Results

Theorem

For any \((\varepsilon, \delta)\)-DP algorithm \(\mathcal{M}\) such that

\displaystyle \mathbb{E}\big[\lVert\mathcal{M}(X) - \Sigma\rVert_F^2\big] \leq \alpha^2 = O(d)

and

\displaystyle \delta = O\Big( \frac{1}{n \ln n}\Big)

we have

\displaystyle n = \Omega\Big(\frac{d^2}{\alpha\varepsilon}\Big)

Our results covers both regimes

Nearly highest reasonable value

\displaystyle \delta = \tilde O\Big(\frac{1}{d^2}\Big) = o\Big(\frac{1}{n}\Big)

\displaystyle \alpha = O(1)

[Kamath et al. 22]

Previous lower bounds required

\displaystyle n = \Omega\big(\tfrac{d^2}{\alpha\varepsilon}\big)

\displaystyle \Bigg \{

[Narayanan 23]

Main Contribution: Fingerprinting Lemma without independence

Which Distribution to Use?

Wishart Distribution

Our results use a very natural distribution:

\displaystyle \Sigma = \frac{1}{2d} \; G \; G^{T}

\(d \times 2d\) random Gaussian matrix

\displaystyle \succeq 0

Natural distribution over PSD matrices

Entries are highly correlated

A Different Correlation Statistic

A Peek Into the Proof for 1D

Lemma (Gaussian Fingerprinting Lemma)

\(\mu \sim \mathcal{N}(0, 1/2)\)

\(x_1, \dotsc, x_n \sim \mathcal{N}(\mu,1)\)

\displaystyle \mathbb{E}\Big [\sum_{i = 1}^n (\mathcal{M}(X) - \mu) \cdot (x_i - \mu) \Big ] \geq \frac{1}{2}

\displaystyle - \;\mathbb{E} [(\mathcal{M}(X) - \mu)^2 ]

Claim 1

\displaystyle \mathbb{E}\Big [\sum_{i = 1}^n (\mathcal{M}(X) - \mu) \cdot (x_i - \mu) \Big ] = g'(\mu)

\displaystyle X

\displaystyle g(\mu) = \mathbb{E}_X[\mathcal{M}(X)]

Claim 2

\displaystyle \mathbb{E}[ g'(\mu)] = 2 \mathbb{E}[ g(\mu) \mu]

Stein's Lemma

Follows from integration by parts

\displaystyle \mathbb{E}[ g'(\mu)] = \int g'(\mu) \cdot p(\mu) \mathrm{d} \mu

A Peek Into the Proof of New FP Lemma

Fingerprinting Lemma

Need to Lower Bound

\displaystyle \mathbb{E}\Big[ \sum_{i,j }\partial_{ij} \; g(\Sigma)_{ij}\Big]

\displaystyle g(\Sigma) = \mathbb{E}[\mathcal{M}(X)]

\(\Sigma \sim\) Wishart leads to elegant analysis

Stein-Haff Identity

"Move the derivative" from \(g\) to \(p\) with integration by parts

\displaystyle \mathrm{div} g(\Sigma)

\displaystyle \mathbb{E}[ \mathrm{div} g(\Sigma)] = \int \mathrm{div} g(\Sigma) \cdot p(\Sigma) \mathrm{d}\Sigma

Stokes' Theorem

Takeaways

Differential Privacy is a mathematically formal definition of private algorithms

Interesting connections to other areas of theory

Fingerprinting Codes lead to many optimal lower bounds for DP

Fingerprinting Lemmas are more versatile for lower bounds and can be adapted to other settings

New Fingerprinting Lemma escaping the need to bootstrap a 1D result to higher dimensions with independent copies

Thanks!

CombΘ Seminar

for Lower Bounds

Differential Privacy

An Introduction to

Fingerprinting Techniques

and

Victor Sanches Portella

November, 2024

cs.ubc.ca/~victorsp

Seminário CombO 1 Nov

By Victor Sanches Portella

for Lower Bounds

Differential Privacy

An Introduction to

Fingerprinting Techniques

and

Who am I?

Privacy? Why, What, and How

What do we mean by "privacy" in this case?

Real-life example - NY Taxi Dataset

Real-life example - Netflix Dataset

Take Away from Examples

Differential Privacy

Differential Privacy (Formally)

The Advantages of Differential Privacy

DP and Other Areas of ML and TCS

An Example: Computing the Mean

An Example: Computing the Mean

Fingerprinting Codes

A Lower Bound Strategy

Avoiding Pirated Movies via Fingerprinting

Fingerprinting Codes

Fingerprinting Codes for Lower Bounds

The Good, The Bad, and The Ugly of Codes

Fingerprinting Lemmas

Fingerprinting Lemmas

Fingerprinting Lemma - Picture

From 1D Lemma to a Code(-Like) Object

From Lemma to Lower Bounds

Extension to Gaussian Case

Lower Bounds for Gaussian

Covariance Matrix Estimation

Privately Estimating a Covariance Matrix

Roadblocks to Fingerprinting Lemmas

Our Results

Which Distribution to Use?

A Different Correlation Statistic

A Peek Into the Proof for 1D

A Peek Into the Proof of New FP Lemma

Takeaways

for Lower Bounds

Differential Privacy

An Introduction to

Fingerprinting Techniques

and

Seminário CombO 1 Nov

More from Victor Sanches Portella