Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts

Processing high-volume, streaming data is increasingly common in modern statistics and machine learning, where batch-mode algorithms are often impractical because they require repeated passes over the full dataset. This has motivated incremental stochastic estimation methods, including the incremental stochastic Expectation-Maximization (EM) algorithm formulated via stochastic approximation. In this work, we revisit and analyze an incremental stochastic variant of the Majorization-Minimization (MM) algorithm, which generalizes incremental stochastic EM as a special case. Our approach relaxes key EM requirements, such as explicit latent-variable representations, enabling broader applicability and greater algorithmic flexibility. We establish theoretical guarantees for the incremental stochastic MM algorithm, proving consistency in the sense that the iterates converge to a stationary point characterized by a vanishing gradient of the objective. We demonstrate these advantages on a softmax-gated mixture of experts (MoE) regression problem, for which no stochastic EM algorithm is available. Empirically, our method consistently outperforms widely used stochastic optimizers, including stochastic gradient descent, root mean square propagation, adaptive moment estimation, and second-order clipped stochastic optimization. These results support the development of new incremental stochastic algorithms, given the central role of softmax-gated MoE architectures in contemporary deep neural networks for heterogeneous data modeling. Beyond synthetic experiments, we also validate practical effectiveness on two real-world datasets, including a bioinformatics study of dent maize genotypes under drought stress that integrates high-dimensional proteomics with ecophysiological traits, where incremental stochastic MM yields stable gains in predictive performance.

翻译：处理高容量流式数据在现代统计学和机器学习中日益普遍，而批处理算法因需对完整数据集进行多次遍历往往不切实际。这推动了增量随机估计方法的发展，包括通过随机逼近构建的增量随机期望最大化算法。本研究重新审视并分析了一种增量随机优化-最小化算法的变体，该算法将增量随机EM算法作为特例进行推广。我们的方法放宽了EM算法的关键要求（如显式潜变量表示），从而拓展了应用范围并增强了算法灵活性。我们为增量随机MM算法建立了理论保证，证明其迭代序列以目标函数梯度消失的平稳点为收敛目标的相容性。我们在softmax门控的专家混合回归问题上展示了这些优势，该问题目前尚无可用的随机EM算法。实证研究表明，我们的方法在包括随机梯度下降、均方根传播、自适应矩估计以及二阶截断随机优化在内的广泛使用的随机优化器中持续表现出优越性能。鉴于softmax门控MoE架构在当代深度神经网络异质数据建模中的核心地位，这些结果为开发新型增量随机算法提供了支撑。除合成实验外，我们还在两个真实数据集上验证了实际有效性，包括一项整合高维蛋白质组学与生态生理性状的干旱胁迫下齿玉米基因型生物信息学研究，其中增量随机MM算法在预测性能上实现了稳定提升。