Robustness to adversarial attacks is typically evaluated with adversarial accuracy. While essential, this metric does not capture all aspects of robustness and in particular leaves out the question of how many perturbations can be found for each point. In this work, we introduce an alternative approach, adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation. We show that sparsity provides valuable insight into neural networks in multiple ways: for instance, it illustrates important differences between current state-of-the-art robust models them that accuracy analysis does not, and suggests approaches for improving their robustness. When applying broken defenses effective against weak attacks but not strong ones, sparsity can discriminate between the totally ineffective and the partially effective defenses. Finally, with sparsity we can measure increases in robustness that do not affect accuracy: we show for example that data augmentation can by itself increase adversarial robustness, without using adversarial training.
翻译:对抗攻击的鲁棒性通常通过对抗精度进行评估。尽管这一指标至关重要,但它未能涵盖鲁棒性的所有方面,尤其忽略了每个数据点可被找到的扰动数量问题。在本工作中,我们提出了一种替代性方法——对抗稀疏度,该指标量化了在给定输入点及扰动方向约束条件下,找到成功扰动的难度。我们论证了稀疏度能够从多个角度为神经网络提供有价值的见解:例如,它揭示了当前最先进鲁棒模型之间精度分析无法体现的重要差异,并提出了改进其鲁棒性的方法。在应用针对弱攻击有效但无法抵御强攻击的已破解防御时,稀疏度能够区分完全无效的防御与部分有效的防御。最后,借助稀疏度,我们可以衡量不影响精度的鲁棒性提升:例如,我们展示数据增强本身即可提升对抗鲁棒性,而无需使用对抗训练。