Ensembling can improve the performance of Neural Networks, but existing approaches struggle when the architecture likelihood surface has dispersed, narrow peaks. Furthermore, existing methods construct equally weighted ensembles, and this is likely to be vulnerable to the failure modes of the weaker architectures. By viewing ensembling as approximately marginalising over architectures we construct ensembles using the tools of Bayesian Quadrature -- tools which are well suited to the exploration of likelihood surfaces with dispersed, narrow peaks. Additionally, the resulting ensembles consist of architectures weighted commensurate with their performance. We show empirically -- in terms of test likelihood, accuracy, and expected calibration error -- that our method outperforms state-of-the-art baselines, and verify via ablation studies that its components do so independently.
翻译:集成技术能够提升神经网络的性能,但当架构似然面存在分散且狭窄的峰值时,现有方法难以有效应对。此外,现有方法构建的集成模型采用等权重组合,这容易受到弱架构失效模式的影响。我们将集成过程视为对架构的近似边缘化,借助贝叶斯求积法工具构建集成模型——该工具特别适用于探索具有分散狭窄峰值的似然面。由此产生的集成模型由与其性能相称的加权架构组成。我们在测试似然度、准确率和期望校准误差三个维度上的实证结果表明,该方法优于当前最先进的基线模型,并通过消融研究验证了其各组件具有独立贡献能力。