Mixture of experts (MoE) has a well-principled finite mixture model construction for prediction, allowing the gating network (mixture weights) to learn from the predictors (explanatory variables) together with the experts' network (mixture component densities). We investigate the estimation properties of MoEs in a high-dimensional setting, where the number of predictors is much larger than the sample size, for which the literature lacks computational and especially theoretical results. We consider the class of finite MoE models with softmax gating functions and Gaussian regression experts, and focus on the theoretical properties of their $l_1$-regularized estimation via the Lasso. We provide a lower bound on the regularization parameter of the Lasso penalty that ensures an $l_1$-oracle inequality is satisfied by the Lasso estimator according to the Kullback--Leibler loss. We further state an $l_1$-ball oracle inequality for the $l_1$-penalized maximum likelihood estimator from the model selection.
翻译:混合专家模型(MoE)具有基于有限混合模型框架的预测原理,其门控网络(混合权重)可与专家网络(混合成分密度)共同从预测变量(解释变量)中学习。本文研究了高维场景下(预测变量数量远大于样本量)MoE的估计性质,而该领域文献中尚缺乏相关计算及理论结果。我们考虑采用softmax门控函数与高斯回归专家构成的有限MoE模型,重点探讨其通过Lasso进行$l_1$正则化估计的理论性质。研究给出了Lasso惩罚项正则化参数的下界,该下界确保Lasso估计量在Kullback-Leibler损失下满足$l_1$-oracle不等式。进一步地,我们针对模型选择中的$l_1$惩罚极大似然估计,提出了$l_1$球oracle不等式。