Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon recent work on reparametrizations of neural networks, we propose a simple variational family that considers two independent linear subspaces of the parameter space. These represent functional changes inside and outside the support of training data. This allows us to build a fully-correlated approximate posterior reflecting the overparametrization that tunes easy-to-interpret hyperparameters. We develop scalable numerical routines that maximize the associated evidence lower bound (ELBO) and sample from the approximate posterior. Empirically, we observe state-of-the-art performance across tasks, models, and datasets compared to a wide array of baseline methods. Our results show that approximate Bayesian inference applied to deep neural networks is far from a lost cause when constructing inference mechanisms that reflect the geometry of reparametrizations.
翻译:变分平均场近似在处理当代过参数化深度神经网络时往往面临困难。虽然贝叶斯方法通常与高质量的预测和不确定性估计相关联,但实际应用中却常出现相反情况,包括训练不稳定、预测能力差以及校准效果不佳。基于近期关于神经网络重参数化的研究,我们提出了一种简单的变分族,该变分族考虑参数空间中的两个独立线性子空间。这些子空间分别表示训练数据支撑集内部和外部的函数变化。这使得我们能够构建一个完全相关的近似后验分布,以反映过参数化特性,并调整易于解释的超参数。我们开发了可扩展的数值算法,用于最大化相关的证据下界(ELBO)并从近似后验中采样。实证研究表明,与多种基线方法相比,该方法在任务、模型和数据集上均表现出最先进的性能。我们的结果表明,当构建能够反映重参数化几何特性的推断机制时,将近似贝叶斯推断应用于深度神经网络远非无望之举。