Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in real-world scenarios. In recent years, many OOD detectors have been developed, and even the benchmarking has been standardized, i.e. OpenOOD. The number of post-hoc detectors is growing fast. They are showing an option to protect a pre-trained classifier against natural distribution shifts and claim to be ready for real-world scenarios. However, its effectiveness in dealing with adversarial examples (AdEx) has been neglected in most studies. In cases where an OOD detector includes AdEx in its experiments, the lack of uniform parameters for AdEx makes it difficult to accurately evaluate the performance of the OOD detector. This paper investigates the adversarial robustness of 16 post-hoc detectors against various evasion attacks. It also discusses a roadmap for adversarial defense in OOD detectors that would help adversarial robustness. We believe that level 1 (AdEx on a unified dataset) should be added to any OOD detector to see the limitations. The last level in the roadmap (defense against adaptive attacks) we added for integrity from an adversarial machine learning (AML) point of view, which we do not believe is the ultimate goal for OOD detectors.
翻译:检测分布外(OOD)输入对于在现实场景中安全部署深度学习模型至关重要。近年来,已开发出许多OOD检测器,其基准测试甚至已实现标准化(如OpenOOD)。后验检测器的数量正在快速增长,它们为保护预训练分类器免受自然分布偏移提供了一种可行方案,并声称已具备现实场景适用性。然而,大多数研究忽视了此类检测器处理对抗样本(AdEx)的有效性。即使部分OOD检测器在实验中纳入了AdEx,由于缺乏统一的对抗样本参数设置,仍难以准确评估其性能。本文研究了16种后验检测器针对多种逃逸攻击的对抗鲁棒性,并探讨了提升OOD检测器对抗防御能力的路线图。我们认为所有OOD检测器都应至少通过第一级测试(在统一数据集上评估对抗样本)以明确其局限性。路线图中最高级别的防御(针对自适应攻击的防御)是从对抗性机器学习(AML)角度为保持完整性而增设的,但我们认为这并非OOD检测器的终极目标。