We establish the first mathematically rigorous link between Bayesian, variational Bayesian, and ensemble methods. A key step towards this it to reformulate the non-convex optimisation problem typically encountered in deep learning as a convex optimisation in the space of probability measures. On a technical level, our contribution amounts to studying generalised variational inference through the lense of Wasserstein gradient flows. The result is a unified theory of various seemingly disconnected approaches that are commonly used for uncertainty quantification in deep learning -- including deep ensembles and (variational) Bayesian methods. This offers a fresh perspective on the reasons behind the success of deep ensembles over procedures based on parameterised variational inference, and allows the derivation of new ensembling schemes with convergence guarantees. We showcase this by proposing a family of interacting deep ensembles with direct parallels to the interactions of particle systems in thermodynamics, and use our theory to prove the convergence of these algorithms to a well-defined global minimiser on the space of probability measures.
翻译:我们建立了贝叶斯方法、变分贝叶斯方法与集成方法之间的首个数学上严格的理论联系。实现这一目标的关键步骤,是将深度学习领域通常遇到的非凸优化问题重新表述为概率测度空间上的凸优化问题。在技术层面,我们的贡献在于通过 Wasserstein 梯度流的视角研究广义变分推断。最终结果形成了一个统一的理论框架,涵盖了深度学习不确定性量化中常用的多种看似无关的方法——包括深度集成与(变分)贝叶斯方法。这为深度集成在基于参数化变分推断的方法中取得成功的原因提供了全新视角,并允许推导出具有收敛保证的新型集成方案。我们通过提出一类与热力学中粒子系统相互作用直接对应的相互作用深度集成来展示这一成果,并利用我们的理论证明这些算法在概率测度空间上收敛到定义明确的全局最小值。