Mixture-of-experts models provide a flexible framework for learning complex probabilistic input-output relationships by combining multiple expert models through an input-dependent gating mechanism. These models have become increasingly prominent in modern machine learning, yet their theoretical properties in the Bayesian framework remain largely unexplored. In this paper, we study Bayesian mixture-of-experts models, focusing on the ubiquitous softmax-based gating mechanism. Specifically, we investigate the asymptotic behavior of the posterior distribution for three fundamental statistical tasks: density estimation, parameter estimation, and model selection. First, we establish posterior contraction rates for density estimation, both in the regimes with a fixed, known number of experts and with a random learnable number of experts. We then analyze parameter estimation and derive convergence guarantees based on tailored Voronoi-type losses, which account for the complex identifiability structure of mixture-of-experts models. Finally, we propose and analyze two complementary strategies for selecting the number of experts. Taken together, these results provide one of the first systematic theoretical analyses of Bayesian mixture-of-experts models with softmax gating, and yield several theory-grounded insights for practical model design.
翻译:混合专家模型通过依赖输入的软门控机制组合多个专家模型,为学习复杂的概率输入-输出关系提供了灵活框架。这类模型在现代机器学习中日益重要,但其在贝叶斯框架下的理论性质仍鲜有探索。本文研究贝叶斯混合专家模型,重点关注广泛使用的基于softmax的门控机制。具体而言,我们考察了面向密度估计、参数估计和模型选择这三项基础统计任务的后验分布渐近行为。首先,我们建立了密度估计的后验收缩率,涵盖专家数量固定已知和随机可学习两种情形。随后,我们分析了参数估计问题,并基于定制化的Voronoi型损失函数推导了收敛保证——该损失函数充分考虑了混合专家模型复杂的可辨识结构。最后,我们提出并分析了两种互补的专家数量选择策略。综合来看,这些研究成果首次系统地理论分析了采用softmax门控的贝叶斯混合专家模型,并为实际模型设计提供了若干基于理论洞见的指导原则。