Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multi-modal in nature, with local modes exhibiting varying generalization performance. Given a practical budget, targeting at the original posterior can lead to suboptimal performance, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. We prove the convergence of our method and further show that it converges faster than several existing flatness-aware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration, and out-of-distribution detection.
翻译:贝叶斯深度学习依赖于后验分布估计的质量。然而,深度神经网络的后验本质上高度多模态,其中局部模态表现出不同的泛化性能。在实际预算限制下,针对原始后验分布进行采样可能导致次优性能,因为部分样本可能陷入“不良”模态并遭受过拟合。利用“优良”泛化误差低的模态通常位于能量景观中的平坦盆地这一观察,我们提出将后验采样偏向这些平坦区域。具体而言,我们引入一个辅助引导变量,其平稳分布类似于去除了尖锐模态的平滑后验,从而引导MCMC采样器进入平坦盆地。通过将该引导变量与模型参数结合,我们构建了一个简单的联合分布,能够在最小化计算开销的同时实现高效采样。我们证明了所提出方法的收敛性,并进一步说明在强凸设定下,其收敛速度快于多种现有的平坦感知方法。实证结果表明,我们的方法能成功从后验的平坦盆地中采样,并在分类、校准和分布外检测等多个基准测试中优于所有对比基线。