CoCoAFusE：超越专家混合的模型融合方法 (CoCoAFusE: Beyond Mixtures of Experts via Model Fusion)

Many learning problems involve multiple patterns and varying degrees of uncertainty dependent on the covariates. Advances in Deep Learning (DL) have addressed these issues by learning highly nonlinear input-output dependencies. However, model interpretability and Uncertainty Quantification (UQ) have often straggled behind. In this context, we introduce the Competitive/Collaborative Fusion of Experts (CoCoAFusE), a novel, Bayesian Covariates-Dependent Modeling technique. CoCoAFusE builds on the very philosophy behind Mixtures of Experts (MoEs), blending predictions from several simple sub-models (or "experts") to achieve high levels of expressiveness while retaining a substantial degree of local interpretability. Our formulation extends that of a classical Mixture of Experts by contemplating the fusion of the experts' distributions in addition to their more usual mixing (i.e., superimposition). Through this additional feature, CoCoAFusE better accommodates different scenarios for the intermediate behavior between generating mechanisms, resulting in tighter credible bounds on the response variable. Indeed, only resorting to mixing, as in classical MoEs, may lead to multimodality artifacts, especially over smooth transitions. Instead, CoCoAFusE can avoid these artifacts even under the same structure and priors for the experts, leading to greater expressiveness and flexibility in modeling. This new approach is showcased extensively on a suite of motivating numerical examples and a collection of real-data ones, demonstrating its efficacy in tackling complex regression problems where uncertainty is a key quantity of interest.

翻译：许多学习问题涉及多种模式以及随协变量变化的不确定性程度。深度学习（DL）的进展通过学习高度非线性的输入-输出依赖关系来解决这些问题。然而，模型可解释性与不确定性量化（UQ）的发展往往滞后。在此背景下，我们提出竞争/协作专家融合（CoCoAFusE），一种新颖的贝叶斯协变量依赖建模技术。CoCoAFusE基于专家混合（MoEs）的核心思想，融合多个简单子模型（或称“专家”）的预测，在实现高度表达力的同时保持相当程度的局部可解释性。我们的公式扩展了经典专家混合模型，不仅考虑专家分布的常规混合（即叠加），还进一步探索其分布的融合。通过这一新增特性，CoCoAFusE能更好地适应不同生成机制间过渡行为的多种场景，从而为响应变量提供更紧凑的置信区间。事实上，如经典MoE仅采用混合策略，可能导致多模态伪影，尤其在平滑过渡区域。相反，即使在相同专家结构和先验条件下，CoCoAFusE也能避免此类伪影，从而获得更强的建模表达力与灵活性。我们通过一系列启发性数值示例和真实数据案例全面展示了这一新方法的有效性，证明其在处理以不确定性为核心关注点的复杂回归问题中的卓越性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日