Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.
翻译:理解softmax门控高斯混合专家模型的参数估计一直是文献中长期存在的开放性问题。这主要源于softmax门控函数带来的三个基本理论挑战:(i) 参数仅能识别至平移等价;(ii) 高斯密度中softmax门控与专家函数之间通过偏微分方程产生的内在交互作用;(iii) softmax门控高斯混合专家条件密度分子与分母之间的复杂依赖关系。我们通过提出参数间新颖的Voronoi损失函数,并建立极大似然估计(MLE)求解这些模型参数估计的收敛速率,解决了上述挑战。当真实专家数量未知且被过度指定时,我们的研究结果表明MLE的收敛速率与多项式方程组可解性问题存在关联。