We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b).
翻译:我们研究在差分隐私(DP)约束下估计高斯混合模型的问题。主要结论是:在满足$(\varepsilon,\delta)$-DP的前提下,$\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$个样本足以将$k$个高斯混合模型估计至全变差距离$\alpha$。这是该问题首个不依赖高斯混合模型任何结构假设的有限样本复杂度上界。为解决该问题,我们设计了一个可能适用于其他任务的新框架。在较高层次上,我们证明:若某类分布(如高斯分布)同时满足(1)可列表解码性且(2)在全变差距离下具有"局部紧致"覆盖(Bun et al., 2021),则该分布的混合类可实现隐私学习。该证明绕开了一个已知障碍——与高斯分布不同,高斯混合模型不具有局部紧致覆盖(Aden-Ali et al., 2021b)。