Deep neural networks have achieved outstanding performance over various tasks, but they have a critical issue: over-confident predictions even for completely unknown samples. Many studies have been proposed to successfully filter out these unknown samples, but they only considered narrow and specific tasks, referred to as misclassification detection, open-set recognition, or out-of-distribution detection. In this work, we argue that these tasks should be treated as fundamentally an identical problem because an ideal model should possess detection capability for all those tasks. Therefore, we introduce the unknown detection task, an integration of previous individual tasks, for a rigorous examination of the detection capability of deep neural networks on a wide spectrum of unknown samples. To this end, unified benchmark datasets on different scales were constructed and the unknown detection capabilities of existing popular methods were subject to comparison. We found that Deep Ensemble consistently outperforms the other approaches in detecting unknowns; however, all methods are only successful for a specific type of unknown. The reproducible code and benchmark datasets are available at https://github.com/daintlab/unknown-detection-benchmarks .
翻译:深度神经网络在各种任务中取得了杰出性能,但存在一个关键问题:即使面对完全未知的样本也会产生过度自信的预测。已有许多研究成功过滤了这些未知样本,但这些工作仅关注狭义的具体任务,如误分类检测、开放集识别或分布外检测。本研究认为,这些任务本质上应被视为同一问题,因为理想模型应具备对所有这类任务的检测能力。因此,我们引入未知检测任务,将先前独立任务整合为统一框架,用于严格检验深度神经网络在广泛未知样本上的检测能力。为此,我们构建了不同尺度的统一基准数据集,并对现有流行方法的未知检测能力进行了比较。研究发现,深度集成方法在未知样本检测中始终优于其他方法,但所有方法仅能成功应对特定类型的未知样本。可复现的代码与基准数据集已发布于 https://github.com/daintlab/unknown-detection-benchmarks。