The expectation-maximization (EM) algorithm and its variants are widely used in statistics. In high-dimensional mixture linear regression, the model is assumed to be a finite mixture of linear regression and the number of predictors is much larger than the sample size. The standard EM algorithm, which attempts to find the maximum likelihood estimator, becomes infeasible for such model. We devise a group lasso penalized EM algorithm and study its statistical properties. Existing theoretical results of regularized EM algorithms often rely on dividing the sample into many independent batches and employing a fresh batch of sample in each iteration of the algorithm. Our algorithm and theoretical analysis do not require sample-splitting, and can be extended to multivariate response cases. The proposed methods also have encouraging performances in numerical studies.
翻译:期望最大化(EM)算法及其变体在统计学中被广泛应用。在高维混合线性回归中,模型假定为有限个线性回归的混合,且预测变量数量远大于样本量。标准EM算法试图寻找最大似然估计,但在此类模型中不可行。我们设计了组套索惩罚EM算法并研究了其统计性质。现有正则化EM算法的理论结果通常依赖于将样本划分成多个独立批次,并在算法的每次迭代中使用新的一批样本。我们的算法和理论分析不需要样本分割,且可推广至多元响应情形。所提方法在数值研究中亦具有令人满意的表现。