While it is tempting to believe that data distillation preserves privacy, distilled data's empirical robustness against known attacks does not imply a provable privacy guarantee. Here, we develop a provably privacy-preserving data distillation algorithm, called differentially private kernel inducing points (DP-KIP). DP-KIP is an instantiation of DP-SGD on kernel ridge regression (KRR). Following a recent work, we use neural tangent kernels and minimize the KRR loss to estimate the distilled datapoints (i.e., kernel inducing points). We provide a computationally efficient JAX implementation of DP-KIP, which we test on several popular image and tabular datasets to show its efficacy in data distillation with differential privacy guarantees.
翻译:尽管人们倾向于认为数据蒸馏能保护隐私,但蒸馏数据对已知攻击的实证鲁棒性并不等同于可证明的隐私保障。本文提出了一种可证明隐私保护的数据蒸馏算法,称为差分隐私核诱导点(DP-KIP)。该算法是核岭回归(KRR)上差分隐私随机梯度下降(DP-SGD)的一种具体实现。借鉴近期工作,我们利用神经正切核并通过最小化KRR损失来估计蒸馏数据点(即核诱导点)。我们提供了DP-KIP的高效JAX实现,并在多个流行图像与表格数据集上验证了该方法在差分隐私保障下进行数据蒸馏的有效性。