Originally introduced as a neural network for ensemble learning, mixture of experts (MoE) has recently become a fundamental building block of highly successful modern deep neural networks for heterogeneous data analysis in several applications, including those in machine learning, statistics, bioinformatics, economics, and medicine. Despite its popularity in practice, a satisfactory level of understanding of the convergence behavior of Gaussian-gated MoE parameter estimation is far from complete. The underlying reason for this challenge is the inclusion of covariates in the Gaussian gating and expert networks, which leads to their intrinsically complex interactions via partial differential equations with respect to their parameters. We address these issues by designing novel Voronoi loss functions to accurately capture heterogeneity in the maximum likelihood estimator (MLE) for resolving parameter estimation in these models. Our results reveal distinct behaviors of the MLE under two settings: the first setting is when all the location parameters in the Gaussian gating are non-zeros while the second setting is when there exists at least one zero-valued location parameter. Notably, these behaviors can be characterized by the solvability of two different systems of polynomial equations. Finally, we conduct a simulation study to verify our theoretical results.
翻译:摘要:原始作为集成学习神经网络而提出的专家混合模型(MoE),最近已成为多个应用中用于异构数据分析的成功现代深度神经网络的基础构建模块,涵盖机器学习、统计学、生物信息学、经济学及医学等领域。尽管该模型在实践中应用广泛,但对其高斯门控MoE参数估计收敛行为的理解仍远未完善。这一挑战的根本原因在于高斯门控网络和专家网络中包含协变量,导致其参数通过偏微分方程产生内在的复杂相互作用。我们通过设计新型Voronoi损失函数来解决这些问题,该函数能够精确捕捉最大似然估计(MLE)中的异质性,从而解析此类模型的参数估计问题。研究结果揭示了MLE在两种场景下的不同行为:第一种场景是高斯门控中所有位置参数均为非零值,第二种场景则是存在至少一个零值位置参数。值得注意的是,这些行为可通过两组不同多项式方程的可解性来表征。最后,我们通过仿真研究验证了理论结果。