Out-of-distribution (OOD) detection is crucial for deploying robust and reliable machine-learning systems in open-world settings. Despite steady advances in OOD detectors, their interplay with modern training pipelines that maximize in-distribution (ID) accuracy and generalization remains under-explored. We investigate this link through a comprehensive empirical study. Fixing the architecture to the widely adopted ResNet-50, we benchmark 21 post-hoc, state-of-the-art OOD detection methods across 56 ImageNet-trained models obtained via diverse training strategies and evaluate them on eight OOD test sets. Contrary to the common assumption that higher ID accuracy implies better OOD detection performance, we uncover a non-monotonic relationship: OOD performance initially improves with accuracy but declines once advanced training recipes push accuracy beyond the baseline. Moreover, we observe a strong interdependence between training strategy, detector choice, and resulting OOD performance, indicating that no single method is universally optimal.
翻译:分布外(OOD)检测对于在开放世界环境中部署稳健可靠的机器学习系统至关重要。尽管OOD检测器取得了稳步进展,但其与旨在最大化分布内(ID)准确性和泛化能力的现代训练流程之间的相互作用仍未得到充分探索。我们通过一项全面的实证研究来探究这一联系。将架构固定为广泛采用的ResNet-50,我们对通过不同训练策略获得的56个ImageNet训练模型,在八个OOD测试集上评估了21种先进的事后OOD检测方法。与“更高的ID准确率意味着更好的OOD检测性能”这一普遍假设相反,我们发现了一种非单调关系:OOD性能最初随准确率提升而改善,但当先进的训练方案将准确率推至基线以上时,其性能反而下降。此外,我们观察到训练策略、检测器选择与最终OOD性能之间存在强烈的相互依赖关系,这表明没有单一方法是普遍最优的。