Prevalent deterministic deep-learning models suffer from significant over-confidence under distribution shifts. Probabilistic approaches can reduce this problem but struggle with computational efficiency. In this paper, we propose Density-Softmax, a fast and lightweight deterministic method to improve calibrated uncertainty estimation via a combination of density function with the softmax layer. By using the latent representation's likelihood value, our approach produces more uncertain predictions when test samples are distant from the training samples. Theoretically, we show that Density-Softmax can produce high-quality uncertainty estimation with neural networks, as it is the solution of minimax uncertainty risk and is distance-aware, thus reducing the over-confidence of the standard softmax. Empirically, our method enjoys similar computational efficiency as a single forward pass deterministic with standard softmax on the shifted toy, vision, and language datasets across modern deep-learning architectures. Notably, Density-Softmax uses 4 times fewer parameters than Deep Ensembles and 6 times lower latency than Rank-1 Bayesian Neural Network, while obtaining competitive predictive performance and lower calibration errors under distribution shifts.
翻译:摘要:主流确定性深度学习模型在分布偏移下会表现出显著的过度自信。概率方法可以缓解这一问题,但面临计算效率的挑战。本文提出Density-Softmax,一种快速轻量级的确定性方法,通过将密度函数与softmax层结合,实现校准的不确定性估计优化。该方法利用潜在表示的对数似然值,在测试样本远离训练样本时生成更不确定的预测。理论上,我们证明Density-Softmax能够利用神经网络产生高质量的不确定性估计,因为它是最小最大不确定性风险的解且具有距离感知能力,从而降低标准softmax的过度自信。实验表明,该方法在移位玩具、视觉及语言数据集上,与现代深度学习架构结合时,享有与单次前向传播确定性标准softmax相当的计算效率。值得注意的是,Density-Softmax参数数量仅为深度集成方法的四分之一,延迟比Rank-1贝叶斯神经网络低六倍,同时在分布偏移下实现竞争性预测性能和更低的校准误差。