Bayesian neural networks (BNNs) estimate the posterior distribution of model parameters and utilize posterior samples for Bayesian Model Aver- aging (BMA) in prediction. However, despite the crucial role of flatness in the loss landscape in improving the generalization of neural networks, its impact on BMA has been largely overlooked. In this work, we explore how posterior flatness influences BMA generalization and empirically demonstrate that (1) most approximate Bayesian inference methods fail to yield a flat posterior and (2) BMA predictions, without considering posterior flatness, are less effective at improving generalization. To address this, we propose Flat Posterior-aware Bayesian Model Averaging (FP-BMA), a novel training objective that explicitly encourages flat posteriors in a principled Bayesian manner. We also introduce a Flat Posterior-aware Bayesian Transfer Learning scheme that enhances generalization in downstream tasks. Empirically, we show that FP-BMA successfully captures flat posteriors, improving generalization performance.
翻译:贝叶斯神经网络通过估计模型参数的后验分布,并利用后验采样进行贝叶斯模型平均以完成预测。然而,尽管损失函数景观的平坦性对提升神经网络泛化能力具有关键作用,其对贝叶斯模型平均的影响却长期被忽视。本研究探讨后验平坦性如何影响贝叶斯模型平均的泛化性能,并通过实证表明:(1) 大多数近似贝叶斯推断方法未能产生平坦后验;(2) 未考虑后验平坦性的贝叶斯模型平均预测在提升泛化能力方面效果有限。为此,我们提出平坦后验感知的贝叶斯模型平均——一种以贝叶斯原理显式鼓励平坦后验的新型训练目标。同时,我们设计了平坦后验感知的贝叶斯迁移学习方案,以提升下游任务的泛化性能。实验表明,该方法能有效捕获平坦后验,显著提升泛化表现。