Mixtures of experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of available statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in the MoE model to depend on the explanatory variables, along with the experts (or component densities). This permits the modeling of data arising from more complex data generating processes when compared to the classical finite mixtures and finite mixtures of regression models, whose mixing parameters are independent of the covariates. The use of MoE models in a high-dimensional setting, when the number of explanatory variables can be much larger than the sample size, is challenging from a computational point of view, and in particular from a theoretical point of view, where the literature is still lacking results for dealing with the curse of dimensionality, for both the statistical estimation and feature selection problems. We consider the finite MoE model with soft-max gating functions and Gaussian experts for high-dimensional regression on heterogeneous data, and its $l_1$-regularized estimation via the Lasso. We focus on the Lasso estimation properties rather than its feature selection properties. We provide a lower bound on the regularization parameter of the Lasso function that ensures an $l_1$-oracle inequality satisfied by the Lasso estimator according to the Kullback--Leibler loss.
翻译:混合专家(MoE)模型是统计学与机器学习中用于回归与分类问题中数据异质性建模的常用框架,其灵活性源于丰富的统计估计与模型选择工具。MoE模型的灵活性体现在:其混合权重(或门控函数)以及各专家(或成分密度)均可依赖于解释变量。相较于混合参数独立于协变量的经典有限混合模型与有限混合回归模型,该模型能够对更复杂数据生成过程产生的数据建模。在高维场景(解释变量数量可能远超样本量)中使用MoE模型面临计算挑战,尤其在理论层面——现有文献尚缺乏应对维度灾难的成果,涉及统计估计与特征选择两方面问题。本文针对高维异质性数据回归问题,考虑采用soft-max门控函数与高斯专家的有限MoE模型,并基于Lasso进行$l_1$正则化估计。我们聚焦于Lasso的估计性质而非特征选择性质。我们给出了Lasso函数正则化参数的下界,该下界确保了Lasso估计量满足基于Kullback-Leibler损失的$l_1$-神谕不等式。