In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians. All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments. Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum $\ell_2$-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function. Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.
翻译:本文提出了一种在 ${\rm poly}(d,k)$ 时间和样本复杂度下高效学习 $d$ 维空间中 $k$ 个球面分布混合参数的方法。与以往所有方法不同,我们的技术适用于重尾分布,甚至包含协方差非有限的分布实例。当聚类分布的特征函数具有足够重的尾部时,我们的方法均能成功。此类分布包括拉普拉斯分布,但关键性地排除了高斯分布。以往所有学习混合模型的方法都直接或间接依赖于低阶矩。即使在拉普拉斯分布的情形下,我们证明任何此类算法都必须使用超多项式数量的样本。因此,我们的方法成为少数能够绕过矩方法局限性的技术之一。值得注意的是,我们的算法不要求聚类均值之间存在任何最小分离距离。这与球面高斯混合形成鲜明对比——后者即使在信息论意义上也已被证明需要最小 $\ell_2$ 分离距离 [Regev and Vijayaraghavan '17]。我们的方法能够与现有技术良好结合,可为每个分量要么具有重尾特征函数、要么具有亚高斯尾部与轻尾特征函数的混合模型提供“两全其美”的保证。我们的算法基于一种通过高效高维稀疏傅里叶变换学习混合模型的新途径。我们相信该方法将在统计估计领域获得更多应用。作为示例,我们提出了一种针对噪声无关对抗者的鲁棒均值一致性估计算法,该模型受到多重假设检验文献的实际启发,由作者之一在近期硕士论文中正式提出,并已引发后续研究。