Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and validate OOD detection techniques. However, many of them are based on far-OOD samples drawn from very different distributions, and thus lack the complexity needed to capture the nuances of real-world scenarios. In this work, we introduce a comprehensive benchmark for OOD detection, based on ImageNet and Places365, that assigns individual classes as in-distribution or out-of-distribution depending on the semantic similarity with the training set. Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties. Experimental results on different OOD detection techniques show how their measured efficacy depends on the selected benchmark and how confidence-based techniques may outperform classifier-based ones on near-OOD samples.
翻译:深度神经网络正日益广泛应用于各类技术与服务中,但仍极易受到分布外(OOD)样本的影响——即样本来源于与原始训练集不同的分布。解决该问题的常见方法是赋予深度神经网络检测OOD样本的能力。目前已有多个基准被提出用于设计并验证OOD检测技术,但其中许多基准基于差异极大的远OOD样本,因而缺乏捕捉真实场景细微复杂性所需的难度。本研究基于ImageNet和Places365构建了一个综合性OOD检测基准,该基准根据测试样本与训练集之间的语义相似度,将各个类别划分为分布内或分布外。通过采用不同技术确定哪些类别应被视为分布内,可获得具有差异特性的基准。针对多种OOD检测技术的实验结果表明,其测量效能取决于所选基准,且基于置信度的技术在近OOD样本上可能优于基于分类器的技术。