Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians. All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments. Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum $\ell_2$-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function. Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.

翻译：本文提出了一种在${\rm poly}(d,k)$时间与样本复杂度下，高效学习$d$维空间中$k$个球面分布混合模型参数的算法。与所有先前方法不同，本技术适用于重尾分布，且包括连有限协方差都不存在的示例。只要聚类分布的特征函数具有足够重的尾部，我们的方法便能成功。此类分布包括拉普拉斯分布，但关键地排除了高斯分布。所有先前的混合模型学习方法都直接或间接依赖于低阶矩。即使对于拉普拉斯分布情况，我们证明任何此类算法必须使用超多项式数量的样本。因此，我们的方法增加了为数不多的、能绕过矩方法局限性的技术。令人惊讶的是，我们的算法不需要聚类均值之间的最小间隔。这与球面高斯混合模型形成鲜明对比——即使从信息论角度看，最小$\ell_2$间隔也是必要的[Regev and Vijayaraghavan '17]。我们的方法能很好地与现有技术融合，并可为每个成分要么具有重尾特征函数、要么具有轻尾特征函数的亚高斯尾部混合模型获得"兼具两者优势"的保证。我们的算法基于通过高效高维稀疏傅里叶变换学习混合模型的新方法。我们相信该方法将在统计估计中找到更多应用。作为实例，我们给出了一种针对噪声无关对抗者的稳健均值估计算法，该模型受多重假设检验文献启发。它由本文一位作者在其最近的硕士论文中正式提出，并已引发后续研究。