Flat Seeking Bayesian Neural Networks

Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferring a posterior distribution based on observed data. The model sampled from the posterior distribution can be used for providing ensemble predictions and quantifying prediction uncertainty. It is well-known that deep learning models with lower sharpness have better generalization ability. However, existing posterior inferences are not aware of sharpness/flatness in terms of formulation, possibly leading to high sharpness for the models sampled from them. In this paper, we develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior. Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness, hence possibly possessing higher generalization ability. We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks, showing that the flat-seeking counterparts outperform their baselines in all metrics of interest.

翻译：贝叶斯神经网络通过对模型参数施加先验分布，并基于观测数据推断后验分布，为深度学习模型提供概率解释。从后验分布采样的模型可用于集成预测与量化预测不确定性。已有研究表明，低尖锐度的深度学习模型具有更强的泛化能力。然而，现有后验推断方法在公式层面并未考虑尖锐度/平坦性，可能导致采样模型具有较高的尖锐度。本文发展了针对尖锐度感知后验的理论体系、贝叶斯框架及变分推断方法。具体而言，从我们提出的尖锐度感知后验中采样的模型，以及估计该后验的最优近似后验，均具有更优的平坦性，因此可能具备更强的泛化能力。通过将尖锐度感知后验与当前最优贝叶斯神经网络结合进行实验，结果表明：在所有相关指标上，平坦化模型均优于其基线模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日