Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack. Project website: https://github.com/paulgavrikov/adversarial_solarization
翻译:评估深度神经网络在分布外输入下的鲁棒性至关重要,尤其是在自动驾驶等安全关键领域,以及恶意行为者可数字篡改输入以绕过安全防护的安全系统中。然而,设计既能覆盖所有可能场景又能保持准确标签信息的有效分布外测试是一项具有挑战性的任务。现有方法通常需要在攻击的多样性和约束程度之间做出妥协,有时甚至两者皆失。为了向更全面的图像分类模型鲁棒性评估迈出第一步,我们提出了一种基于图像日晒效应的攻击方法,该方法概念简单,却能避免破坏自然图像在不同强度下的全局结构。通过对多个ImageNet模型的全面评估,我们证明了该攻击在未集成到训练数据增强中时能显著降低模型准确率。有趣的是,即使集成到训练中,模型也无法完全免疫准确率下降。在其他设置下,该攻击常可简化为具有模型无关参数的黑盒攻击。针对其他图像损坏的防御措施并不一致地对我们的特定攻击有效。项目网站:https://github.com/paulgavrikov/adversarial_solarization