We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\text{poly}(k,d,1/\alpha,1/\varepsilon,\log(1/\delta))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $\mathbb{R}^d$ up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b).
翻译:我们研究在差分隐私(DP)约束下估计高斯混合模型的问题。主要结论是:$\text{poly}(k,d,1/\alpha,1/\varepsilon,\log(1/\delta))$个样本足以在满足$(\varepsilon, \delta)$-DP的同时,估计$\mathbb{R}^d$中由$k$个高斯分量组成的混合分布,且总变差距离误差不超过$\alpha$。这是该问题首个不依赖高斯混合模型结构假设的有限样本复杂度上界。为解决该问题,我们提出一个可能适用于其他任务的新框架。从高层来看,我们证明若某分布类(如高斯分布)(1)具有列表可解码性,且(2)在总变差距离下拥有"局部紧致"覆盖(Bun等人,2021),则该分布类的混合分布可通过隐私学习实现。该证明规避了已知障碍——与高斯分布不同,高斯混合模型不存在局部紧致覆盖(Aden-Ali等人,2021b)。