We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with $n>1$ components learns from data that are generated by a single ground truth Gaussian distribution. While results for the special case of 2-Gaussian mixtures are well-known, a general global convergence analysis for arbitrary $n$ remains unresolved and faces several new technical barriers since the convergence becomes sub-linear and non-monotonic. To address these challenges, we construct a novel likelihood-based convergence analysis framework and rigorously prove that gradient EM converges globally with a sublinear rate $O(1/\sqrt{t})$. This is the first global convergence result for Gaussian mixtures with more than $2$ components. The sublinear convergence rate is due to the algorithmic nature of learning over-parameterized GMM with gradient EM. We also identify a new emerging technical challenge for learning general over-parameterized GMM: the existence of bad local regions that can trap gradient EM for an exponential number of steps.
翻译:本研究探讨了过参数化设置下高斯混合模型(GMM)的梯度期望最大化(EM)算法,其中包含$n>1$个分量的通用GMM从单一真实高斯分布生成的数据中进行学习。尽管针对双高斯混合模型这一特殊情形的结果已广为人知,但针对任意$n$值的通用全局收敛性分析仍未解决,且面临若干新的技术障碍,因为其收敛过程呈现次线性与非单调特性。为应对这些挑战,我们构建了一种基于似然函数的全新收敛性分析框架,严格证明了梯度EM算法以$O(1/\sqrt{t})$的次线性速率实现全局收敛。这是首个针对超过$2$个分量的高斯混合模型的全局收敛性结果。次线性收敛速率源于使用梯度EM学习过参数化GMM的算法本质。我们还发现学习通用过参数化GMM时出现的新技术挑战:存在可能使梯度EM陷入指数级步数的劣质局部区域。