Presented by Sheng Long, Liren Shan, Lixu Wang
By Vitaly Feldman
Generalization, from Desh Raj
digits from MNIST dataset
DP cannot achieve the same accuracy as non-private learning because it cannot memorize the tail of the mixture distribution.
... so far it seems that learning does require memorization.
\(\to\) When does learning require memorization?
Goal: Find algorithm \(\mathcal{A}\) to minimize the generalization error given a sample set \(Z\), defined as
\(\)
where \(\texttt{err}_{D,f}(\mathcal{A}) = \mathbb{E}_{(x,y) \sim (D,f),h\sim \mathcal{A}(Z)} [h(x)\neq y]\)
Theorem 2.3(informal): Given a frequency prior \(\pi\), and a distribution \(\mathcal{F}\) over labeling function, we have for every algorithm \(\mathcal{A}\) and every dataset \(Z \in (X,Y)^n\),
where \(\texttt{errn}_Z(\mathcal{A},\ell) = \mathbb{E}_{h\sim \mathcal{A}}[\texttt{errn}_Z(h,\ell)]\)
and \(\texttt{errn}_Z(h,\ell)\) is the sum of empirical error rates for data points that appear exactly \(\ell\) times in \(Z\).
Theorem 3.2. The generalization error of learning algorithm \(\mathcal{A}\)
where \(S^{\setminus i}\) denotes the dataset that is \(S\) with \((x_i, y_i)\) removed.
Questions?