Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multi-modal in nature, with local modes exhibiting varying generalization performance. Given a practical budget, targeting at the original posterior can lead to suboptimal performance, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. We prove the convergence of our method and further show that it converges faster than several existing flatness-aware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration, and out-of-distribution detection.
翻译:贝叶斯深度学习依赖于后验分布估计的质量。然而,深度神经网络的后验本质上具有高度多模态性,局部模态表现出不同的泛化性能。在给定实际预算的情况下,针对原始后验进行采样可能导致次优性能,因为部分样本可能陷入“不良”模态并遭受过拟合。利用“良好”模态(低泛化误差)通常位于能量景观平坦盆地这一观察结果,我们提出将后验采样偏向这些平坦区域。具体而言,我们引入一个辅助引导变量,其平稳分布类似于消除了尖锐模态的平滑后验,从而引导MCMC采样器进入平坦盆地。通过将该引导变量与模型参数结合,我们构建了一个简单的联合分布,能够以极小的计算开销实现高效采样。我们证明了该方法的收敛性,并进一步表明在强凸设置下其收敛速度快于多种现有平坦感知方法。实验结果表明,我们的方法能够成功从后验的平坦盆地中采样,并在分类、校准及分布外检测等多个基准任务上优于所有对比基线方法。