One of the challenges for neural networks in real-life applications is the overconfident errors these models make when the data is not from the original training distribution. Addressing this issue is known as Out-of-Distribution (OOD) detection. Many state-of-the-art OOD methods employ an auxiliary dataset as a surrogate for OOD data during training to achieve improved performance. However, these methods fail to fully exploit the local information embedded in the auxiliary dataset. In this work, we propose the idea of leveraging the information embedded in the gradient of the loss function during training to enable the network to not only learn a desired OOD score for each sample but also to exhibit similar behavior in a local neighborhood around each sample. We also develop a novel energy-based sampling method to allow the network to be exposed to more informative OOD samples during the training phase. This is especially important when the auxiliary dataset is large. We demonstrate the effectiveness of our method through extensive experiments on several OOD benchmarks, improving the existing state-of-the-art FPR95 by 4% on our ImageNet experiment. We further provide a theoretical analysis through the lens of certified robustness and Lipschitz analysis to showcase the theoretical foundation of our work. We will publicly release our code after the review process.
翻译:神经网络在实际应用中的一个挑战是,当数据并非来自原始训练分布时,模型会产生过度自信的错误。解决这一问题被称为分布外(Out-of-Distribution,OOD)检测。许多最先进的OOD方法在训练过程中使用辅助数据集作为OOD数据的替代,以提升性能。然而,这些方法未能充分利用辅助数据集中蕴含的局部信息。在这项工作中,我们提出了一种利用训练过程中损失函数梯度信息的思路,使网络不仅能为每个样本学习期望的OOD评分,还能在样本周围的局部邻域内表现出相似的行为。我们还开发了一种新颖的基于能量的采样方法,使网络在训练阶段能够接触到更具信息量的OOD样本,这在辅助数据集较大时尤为重要。通过在多个OOD基准上的广泛实验,我们证明了该方法的有效性,在ImageNet实验中,将现有最优方法的FPR95降低了4%。此外,我们通过认证鲁棒性和Lipschitz分析的视角进行了理论分析,展示了本工作的理论基础。我们将在审稿过程结束后公开代码。