We establish the first mathematically rigorous link between Bayesian, variational Bayesian, and ensemble methods. A key step towards this it to reformulate the non-convex optimisation problem typically encountered in deep learning as a convex optimisation in the space of probability measures. On a technical level, our contribution amounts to studying generalised variational inference through the lense of Wasserstein gradient flows. The result is a unified theory of various seemingly disconnected approaches that are commonly used for uncertainty quantification in deep learning -- including deep ensembles and (variational) Bayesian methods. This offers a fresh perspective on the reasons behind the success of deep ensembles over procedures based on parameterised variational inference, and allows the derivation of new ensembling schemes with convergence guarantees. We showcase this by proposing a family of interacting deep ensembles with direct parallels to the interactions of particle systems in thermodynamics, and use our theory to prove the convergence of these algorithms to a well-defined global minimiser on the space of probability measures.
翻译:我们建立了贝叶斯方法、变分贝叶斯方法与集成方法之间首个数学上严格的联系。实现这一目标的关键步骤在于,将深度学习通常面临的非凸优化问题重新表述为概率测度空间上的凸优化问题。在技术层面,我们的贡献相当于通过瓦瑟斯坦梯度流的视角研究广义变分推断。这一成果为深度学习领域中常用于不确定性量化的各种看似独立的方法——包括深度集成与(变分)贝叶斯方法——提供了统一的理论框架。这为我们理解深度集成相较于基于参数化变分推断的方法为何更胜一筹提供了全新视角,并能够推导出具有收敛保证的新型集成方案。我们提出了一族交互式深度集成方法,其与热力学中粒子系统的相互作用直接对应,并利用我们的理论证明了这些算法在概率测度空间中收敛至定义良好的全局最小值。