Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferring a posterior distribution based on observed data. The model sampled from the posterior distribution can be used for providing ensemble predictions and quantifying prediction uncertainty. It is well-known that deep learning models with lower sharpness have better generalization ability. However, existing posterior inferences are not aware of sharpness/flatness in terms of formulation, possibly leading to high sharpness for the models sampled from them. In this paper, we develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior. Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness, hence possibly possessing higher generalization ability. We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks, showing that the flat-seeking counterparts outperform their baselines in all metrics of interest.
翻译:贝叶斯神经网络通过对模型参数施加先验分布,并基于观测数据推断后验分布,为深度学习模型提供了概率解释。从后验分布中采样的模型可用于集成预测及量化预测不确定性。已有研究表明,锐度较低的深度学习模型具有更强的泛化能力。然而,现有后验推断方法在公式化层面尚未考虑锐度/平坦性问题,这可能导致从中采样的模型锐度过高。本文针对锐度感知后验提出了理论框架、贝叶斯设定及变分推断方法。具体而言,从该锐度感知后验中采样的模型,以及用于估计该锐度感知后验的最优近似后验,均具有更优的平坦性,因而可能具备更强的泛化能力。我们通过将锐度感知后验与最先进的贝叶斯神经网络相结合开展实验,结果表明所有关注指标上,平坦性搜索方法均优于基准方法。