Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferring a posterior distribution based on observed data. The model sampled from the posterior distribution can be used for providing ensemble predictions and quantifying prediction uncertainty. It is well-known that deep learning models with lower sharpness have better generalization ability. However, existing posterior inferences are not aware of sharpness/flatness in terms of formulation, possibly leading to high sharpness for the models sampled from them. In this paper, we develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior. Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness, hence possibly possessing higher generalization ability. We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks, showing that the flat-seeking counterparts outperform their baselines in all metrics of interest.
翻译:贝叶斯神经网络通过对模型参数施加先验分布,并基于观测数据推断后验分布,为深度学习模型提供概率解释。从后验分布采样的模型可用于集成预测与量化预测不确定性。已有研究表明,低尖锐度的深度学习模型具有更强的泛化能力。然而,现有后验推断方法在公式层面并未考虑尖锐度/平坦性,可能导致采样模型具有较高的尖锐度。本文发展了针对尖锐度感知后验的理论体系、贝叶斯框架及变分推断方法。具体而言,从我们提出的尖锐度感知后验中采样的模型,以及估计该后验的最优近似后验,均具有更优的平坦性,因此可能具备更强的泛化能力。通过将尖锐度感知后验与当前最优贝叶斯神经网络结合进行实验,结果表明:在所有相关指标上,平坦化模型均优于其基线模型。