We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings under some assumptions and its global convergence rate with random initialization has been affirmed. However, the exponent of convergence has not been theoretically estimated and the geometric properties of the trajectory of EM iterations are not well-understood. In this paper, first, using Bessel functions we provide explicit closed-form expressions for the EM updates under all SNR regimes. Then, in the noiseless setting, we completely characterize the behavior of EM iterations by deriving a recurrence relation at the population level and notably show that all the iterations lie on a certain cycloid. Based on this new trajectory-based analysis, we exhibit the theoretical estimate for the exponent of super-linear convergence and further improve the statistical error bound at the finite-sample level. Our analysis provides a new framework for studying the behavior of EM for Mixed Linear Regression.
翻译:本研究针对双分量混合线性回归(2MLR)问题,探究期望最大化(EM)算法的迭代轨迹与收敛速率。混合线性回归的核心目标是从未标注的观测数据中学习回归模型,而EM算法在线性回归混合模型的求解中具有广泛应用。近期研究已证实,在无噪声及高信噪比(SNR)条件下,基于特定假设的2MLR问题中EM算法具有超线性收敛特性,且随机初始化的全局收敛速率已获验证。然而,收敛指数的理论估计尚未完善,且EM迭代轨迹的几何特性仍未得到充分理解。本文首先利用贝塞尔函数,推导出全SNR区间内EM更新步骤的显式闭型表达式。随后在无噪声设定下,通过建立总体层面的递推关系完整刻画了EM迭代的行为特性,并显著揭示所有迭代点均位于特定摆线上。基于这一全新的轨迹分析方法,我们给出了超线性收敛指数的理论估计,并进一步提升了有限样本层面的统计误差界。本分析为研究混合线性回归中EM算法的行为特性提供了新的理论框架。