We investigate the estimation properties of the mixture of experts (MoE) model in a high-dimensional setting, where the number of predictors is much larger than the sample size, and for which the literature is particularly lacking in theoretical results. We consider the class of softmax-gated Gaussian MoE (SGMoE) models, defined as MoE models with softmax gating functions and Gaussian experts, and focus on the theoretical properties of their $l_1$-regularized estimation via the Lasso. To the best of our knowledge, we are the first to investigate the $l_1$-regularization properties of SGMoE models from a non-asymptotic perspective, under the mildest assumptions, namely the boundedness of the parameter space. We provide a lower bound on the regularization parameter of the Lasso penalty that ensures non-asymptotic theoretical control of the Kullback--Leibler loss of the Lasso estimator for SGMoE models. Finally, we carry out a simulation study to empirically validate our theoretical findings.
翻译:本文研究了高维设定下专家混合(MoE)模型的估计特性,其中预测变量数量远大于样本量,而该领域的理论结果尤为缺乏。我们考虑软最大化门控高斯MoE(SGMoE)模型类,即采用软最大化门控函数和高斯专家函数的MoE模型,重点研究其通过Lasso进行$l_1$正则化估计的理论性质。据我们所知,我们首次在参数空间有界这一最温和的假设条件下,从非渐近视角探究SGMoE模型的$l_1$正则化性质。我们给出了Lasso惩罚正则化参数的下界,该下界能确保SGMoE模型的Lasso估计量在Kullback--Leibler损失上具有非渐近的理论控制。最后,我们通过模拟研究对理论发现进行了实证验证。