Recently, RobustBench (Croce et al. 2020) has become a widely recognized benchmark for the adversarial robustness of image classification networks. In its most commonly reported sub-task, RobustBench evaluates and ranks the adversarial robustness of trained neural networks on CIFAR10 under AutoAttack (Croce and Hein 2020b) with l-inf perturbations limited to eps = 8/255. With leading scores of the currently best performing models of around 60% of the baseline, it is fair to characterize this benchmark to be quite challenging. Despite its general acceptance in recent literature, we aim to foster discussion about the suitability of RobustBench as a key indicator for robustness which could be generalized to practical applications. Our line of argumentation against this is two-fold and supported by excessive experiments presented in this paper: We argue that I) the alternation of data by AutoAttack with l-inf, eps = 8/255 is unrealistically strong, resulting in close to perfect detection rates of adversarial samples even by simple detection algorithms and human observers. We also show that other attack methods are much harder to detect while achieving similar success rates. II) That results on low-resolution data sets like CIFAR10 do not generalize well to higher resolution images as gradient-based attacks appear to become even more detectable with increasing resolutions.
翻译:摘要:近年来,RobustBench(Croce等人,2020)已成为图像分类网络对抗鲁棒性领域中广泛认可的基准。在其最常报告的评估子任务中,RobustBench在CIFAR-10数据集上,采用l∞范数扰动限制为eps=8/255的AutoAttack(Croce与Hein,2020b)方法,评估并排名已训练神经网络的对抗鲁棒性。当前最佳模型的领先得分约为基线水平的60%,因此合理认为该基准具有相当高的挑战性。尽管在近期文献中普遍被接纳,我们旨在探讨RobustBench作为可推广至实际应用的鲁棒性关键指标的适宜性。本文通过大量实验支撑的双重论证对其提出质疑:其一,AutoAttack采用l∞范数、eps=8/255的数据扰动强度过于不切实际,导致即使简单检测算法与人类观察者也能接近完美地识别对抗样本;同时,其他攻击方法在达到相似成功率时却更难被检测。其二,在CIFAR-10等低分辨率数据集上的结果难以泛化至更高分辨率图像,因为基于梯度的攻击随分辨率提升其可检测性反而增强。