We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].
翻译:我们研究了差分隐私约束下高斯混合模型的估计问题。主要结论为:在满足$(\varepsilon, \delta)$-差分隐私且总变差距离不超过$\alpha$的条件下,$\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$个样本足以估计由$k$个高斯分布构成的混合模型。这是该问题在不依赖高斯混合模型任何结构假设下的首个有限样本复杂度上界。为解决该问题,我们设计了一个可能适用于其他任务的新框架。从宏观层面看,我们证明:若某类分布(如高斯分布)同时满足(1)可列表解码性及(2)关于总变差距离存在"局部紧致"覆盖[BKSW19],则其混合类具有可隐私学习性。该证明规避了一个已知障碍——与高斯分布不同,高斯混合模型不存在局部紧致覆盖[AAL21]。