We study the problem of privately estimating the parameters of $d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components. For this, we develop a technique to reduce the problem to its non-private counterpart. This allows us to privatize existing non-private algorithms in a blackbox manner, while incurring only a small overhead in the sample complexity and running time. As the main application of our framework, we develop an $(\varepsilon, \delta)$-differentially private algorithm to learn GMMs using the non-private algorithm of Moitra and Valiant [MV10] as a blackbox. Consequently, this gives the first sample complexity upper bound and first polynomial time algorithm for privately learning GMMs without any boundedness assumptions on the parameters. As part of our analysis, we prove a tight (up to a constant factor) lower bound on the total variation distance of high-dimensional Gaussians which can be of independent interest.
翻译:我们研究了对具有$k$个分量的$d$维高斯混合模型(GMMs)参数进行私有估计的问题。为此,我们开发了一种将该问题简化为其非私有对应问题的方法。这使得我们能够以黑箱方式对现有的非私有算法进行私有化,同时在样本复杂度和运行时间上仅引入较小的开销。作为我们框架的主要应用,我们利用Moitra和Valiant [MV10]的非私有算法作为黑箱,开发了一个$(\varepsilon, \delta)$-差分隐私算法来学习GMMs。因此,这给出了第一个无参数有界性假设条件下私有学习GMMs的样本复杂度上界和第一个多项式时间算法。作为我们分析的一部分,我们证明了高维高斯分布总变分距离的一个紧(至多常数因子)下界,该结果可能具有独立的研究价值。