Since the seminal paper of Hendrycks et al. arXiv:1610.02136, Post-hoc deep Out-of-Distribution (OOD) detection has expanded rapidly. As a result, practitioners working on safety-critical applications and seeking to improve the robustness of a neural network now have a plethora of methods to choose from. However, no method outperforms every other on every dataset arXiv:2210.07242, so the current best practice is to test all the methods on the datasets at hand. This paper shifts focus from developing new methods to effectively combining existing ones to enhance OOD detection. We propose and compare four different strategies for integrating multiple detection scores into a unified OOD detector, based on techniques such as majority vote, empirical and copulas-based Cumulative Distribution Function modeling, and multivariate quantiles based on optimal transport. We extend common OOD evaluation metrics -- like AUROC and FPR at fixed TPR rates -- to these multi-dimensional OOD detectors, allowing us to evaluate them and compare them with individual methods on extensive benchmarks. Furthermore, we propose a series of guidelines to choose what OOD detectors to combine in more realistic settings, i.e. in the absence of known OOD data, relying on principles drawn from Outlier Exposure arXiv:1812.04606. The code is available at https://github.com/paulnovello/multi-ood.
翻译:自Hendrycks等人具有开创性意义的论文arXiv:1610.02136发表以来,基于后处理的深度分布外检测方法得到了迅速发展。因此,从事安全关键应用并寻求提升神经网络鲁棒性的实践者现在拥有大量方法可供选择。然而,没有任何一种方法能在所有数据集上始终优于其他方法arXiv:2210.07242,因此当前的最佳实践是在现有数据集上测试所有方法。本文的研究重点从开发新方法转向有效结合现有方法以增强分布外检测能力。我们提出并比较了四种将多个检测分数整合为统一分布外检测器的策略,这些策略基于多数投票、经验累积分布函数建模、基于Copula的累积分布函数建模以及基于最优传输的多元分位数等技术。我们将常见的分布外评估指标——如AUROC和固定TPR下的FPR——扩展至这些多维分布外检测器,从而能够在广泛基准测试中评估它们并与单一方法进行比较。此外,我们提出了一系列指导原则,用于在更现实的场景(即在缺乏已知分布外数据的情况下)选择待组合的分布外检测器,这些原则借鉴了异常暴露arXiv:1812.04606的思想。相关代码可在https://github.com/paulnovello/multi-ood获取。