Ensembling can improve the performance of Neural Networks, but existing approaches struggle when the architecture likelihood surface has dispersed, narrow peaks. Furthermore, existing methods construct equally weighted ensembles, and this is likely to be vulnerable to the failure modes of the weaker architectures. By viewing ensembling as approximately marginalising over architectures we construct ensembles using the tools of Bayesian Quadrature -- tools which are well suited to the exploration of likelihood surfaces with dispersed, narrow peaks. Additionally, the resulting ensembles consist of architectures weighted commensurate with their performance. We show empirically -- in terms of test likelihood, accuracy, and expected calibration error -- that our method outperforms state-of-the-art baselines, and verify via ablation studies that its components do so independently.
翻译:集成方法能够提升神经网络的性能,但当架构似然曲面呈现分散的窄峰时,现有方法难以有效应对。此外,现有方法构建的集成中所有架构权重相等,这容易受到较弱架构失效模式的影响。通过将集成视为对架构的近似边际化,我们利用贝叶斯求积法的工具来构建集成——这些工具非常适合探索具有分散窄峰的似然曲面。由此产生的集成由性能相称的加权架构组成。我们在测试似然度、准确率和期望校准误差方面的实验表明,我们的方法优于当前最先进的基线方法,并通过消融研究验证了其各组成部分的独立有效性。