We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution from the private sample complexity, which is known to be otherwise necessary without public data. For separated Gaussian mixtures, we assume that the underlying public and private distributions are the same, and we consider two settings: (1) when given a dimension-independent amount of public data, the private sample complexity can be improved polynomially in terms of the number of mixture components, and any dependence on the range parameters of the distribution can be removed in the approximate DP case; (2) when given an amount of public data linear in the dimension, the private sample complexity can be made independent of range parameters even under concentrated DP, and additional improvements can be made to the overall sample complexity.
翻译:我们研究了在拥有少量公共数据的情况下进行差分私有(DP)估计的问题。对于d维高斯分布的私有估计,我们假设公共数据来自一个高斯分布,该分布与私有数据背后的高斯分布在总变差距离上可能具有衰减的相似性。我们证明,在纯DP或集中DP约束下,d+1个公共数据样本足以消除私有数据分布的范围参数对私有样本复杂度的任何依赖,而对于没有公共数据的情况,这种依赖已知是必要的。对于分离的高斯混合模型,我们假设背后的公共和私有分布相同,并考虑两种设定:(1)当给定与维度无关数量的公共数据时,私有样本复杂度可以在混合成分数量上得到多项式改进,且在近似DP情况下,可以消除对分布范围参数的任何依赖;(2)当给定与维度呈线性关系的公共数据量时,即使在集中DP下,私有样本复杂度也可独立于范围参数,并且总体样本复杂度可以进一步得到改进。