We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $\alpha<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(\mu, \Sigma)$, the goal is to output a list of $O(1/\alpha)$ hypotheses at least one of which is close to $\Sigma$ in relative Frobenius norm. Our main result is a $\mathrm{poly}(d,1/\alpha)$ sample and time algorithm for this task that guarantees relative Frobenius norm error of $\mathrm{poly}(1/\alpha)$. Importantly, our algorithm relies purely on spectral techniques. As a corollary, we obtain an efficient spectral algorithm for robust partial clustering of Gaussian mixture models (GMMs) -- a key ingredient in the recent work of [BDJ+22] on robustly learning arbitrary GMMs. Combined with the other components of [BDJ+22], our new method yields the first Sum-of-Squares-free algorithm for robustly learning GMMs. At the technical level, we develop a novel multi-filtering method for list-decodable covariance estimation that may be useful in other settings.
翻译:我们研究列表可解码高斯协方差估计问题。给定$\mathbb R^d$中$n$个点的多重集$T$,其中未知比例$\alpha<1/2$的点独立同分布于未知高斯分布$\mathcal{N}(\mu, \Sigma)$,目标是输出$O(1/\alpha)$个假设的列表,使得其中至少一个假设在相对Frobenius范数下接近$\Sigma$。我们的主要结果是提出一个$\mathrm{poly}(d,1/\alpha)$样本复杂度和时间复杂度的算法,该算法保证$\mathrm{poly}(1/\alpha)$的相对Frobenius范数误差。重要的是,我们的算法完全依赖于谱技术。作为推论,我们获得了一个用于高斯混合模型(GMMs)鲁棒部分聚类的有效谱算法——这是[BDJ+22]近期关于任意GMM鲁棒学习工作的关键组成部分。结合[BDJ+22]的其他组件,我们的新方法首次实现了无需平方和(SOS)框架的GMM鲁棒学习算法。在技术层面,我们发展了一种新颖的多重过滤方法用于列表可解码协方差估计,该方法可能在其他场景中具有应用价值。