Reliable application of machine learning is of primary importance to the practical deployment of deep learning methods. A fundamental challenge is that models are often unreliable due to overconfidence. In this paper, we estimate a model's reliability by measuring \emph{the agreement between its latent space, and the latent space of a foundation model}. However, it is challenging to measure the agreement between two different latent spaces due to their incoherence, \eg, arbitrary rotations and different dimensionality. To overcome this incoherence issue, we design a \emph{neighborhood agreement measure} between latent spaces and find that this agreement is surprisingly well-correlated with the reliability of a model's predictions. Further, we show that fusing neighborhood agreement into a model's predictive confidence in a post-hoc way significantly improves its reliability. Theoretical analysis and extensive experiments on failure detection across various datasets verify the effectiveness of our method on both in-distribution and out-of-distribution settings.
翻译:机器学习应用的可靠性对于深度学习方法的实际部署至关重要。一个根本性挑战在于模型常因过度自信而不可靠。本文通过测量模型潜在空间与基础模型潜在空间之间的一致性来估计模型的可靠性。然而,由于不同潜在空间存在非一致性(例如任意旋转和不同维度),测量两者间的一致性颇具挑战。为解决此非一致性问题,我们设计了潜在空间间的邻域一致性度量,并发现该度量与模型预测可靠性存在显著相关性。进一步研究表明,以事后方式将邻域一致性融入模型预测置信度可显著提升其可靠性。理论分析与跨多种数据集的故障检测实验验证了该方法在分布内和分布外场景下的有效性。