Deep learning models, including modern systems like large language models, are well known to offer unreliable estimates of the uncertainty of their decisions. In order to improve the quality of the confidence levels, also known as calibration, of a model, common approaches entail the addition of either data-dependent or data-independent regularization terms to the training loss. Data-dependent regularizers have been recently introduced in the context of conventional frequentist learning to penalize deviations between confidence and accuracy. In contrast, data-independent regularizers are at the core of Bayesian learning, enforcing adherence of the variational distribution in the model parameter space to a prior density. The former approach is unable to quantify epistemic uncertainty, while the latter is severely affected by model misspecification. In light of the limitations of both methods, this paper proposes an integrated framework, referred to as calibration-aware Bayesian neural networks (CA-BNNs), that applies both regularizers while optimizing over a variational distribution as in Bayesian learning. Numerical results validate the advantages of the proposed approach in terms of expected calibration error (ECE) and reliability diagrams.
翻译:深度学习模型,包括大型语言模型等现代系统,其决策不确定性估计的不可靠性广为人知。为提升模型置信度的质量(即校准),常见方法是在训练损失中添加数据依赖或数据无关的正则化项。数据依赖正则化方法近期在传统频率派学习中被引入,用于惩罚置信度与准确度之间的偏差。相比之下,数据无关正则化是贝叶斯学习的核心,它强制模型参数空间中的变分分布与先验密度保持一致。前者无法量化认知不确定性,而后者会因模型误设受到严重影响。鉴于这两种方法的局限性,本文提出一种名为校准感知贝叶斯神经网络(CA-BNNs)的集成框架,该框架在贝叶斯学习中优化变分分布的同时,同时应用两种正则化方法。数值结果在期望校准误差(ECE)和可靠性图方面验证了所提方法的优势。