We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on $\mathbb{R}^d$ of the form $P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol \mu_i,\mathbf \Sigma_i)$, where $\mathbf \Sigma_i = \mathbf \Sigma \preceq \mathbf I$ and $\min_{i \neq j} \| \boldsymbol \mu_i - \boldsymbol \mu_j\|_2 \geq k^\epsilon$ for some $\epsilon>0$. Known learning algorithms for this family of GMMs have complexity $(dk)^{O(1/\epsilon)}$. In this work, we prove that any Statistical Query (SQ) algorithm for this problem requires complexity at least $d^{\Omega(1/\epsilon)}$. In the special case where the separation is on the order of $k^{1/2}$, we additionally obtain fine-grained SQ lower bounds with the correct exponent. Our SQ lower bounds imply similar lower bounds for low-degree polynomial tests. Conceptually, our results provide evidence that known algorithms for this problem are nearly best possible.
翻译:我们研究分离高斯混合模型在具有共同未知有界协方差矩阵情况下的学习复杂度。具体而言,我们聚焦于学习$\mathbb{R}^d$上形式为$P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol \mu_i,\mathbf \Sigma_i)$的高斯混合模型(GMM),其中$\mathbf \Sigma_i = \mathbf \Sigma \preceq \mathbf I$,且对于某个$\epsilon>0$满足$\min_{i \neq j} \| \boldsymbol \mu_i - \boldsymbol \mu_j\|_2 \geq k^\epsilon$。已知针对此类GMM族的学习算法复杂度为$(dk)^{O(1/\epsilon)}$。在本文中,我们证明任何针对该问题的统计查询(SQ)算法至少需要$d^{\Omega(1/\epsilon)}$的复杂度。在分离度为$k^{1/2}$量级的特殊情况下,我们进一步获得了具有正确指数的细粒度SQ下界。我们的SQ下界暗示了低阶多项式检验的类似下界。从概念上讲,我们的结果提供了证据表明该问题的已知算法几乎是最优的。