Despite the tremendous success of neural networks, benign images can be corrupted by adversarial perturbations to deceive these models. Intriguingly, images differ in their attackability. Specifically, given an attack configuration, some images are easily corrupted, whereas others are more resistant. Evaluating image attackability has important applications in active learning, adversarial training, and attack enhancement. This prompts a growing interest in developing attackability measures. However, existing methods are scarce and suffer from two major limitations: (1) They rely on a model proxy to provide prior knowledge (e.g., gradients or minimal perturbation) to extract model-dependent image features. Unfortunately, in practice, many task-specific models are not readily accessible. (2) Extracted features characterizing image attackability lack visual interpretability, obscuring their direct relationship with the images. To address these, we propose a novel Object Texture Intensity (OTI), a model-free and visually interpretable measure of image attackability, which measures image attackability as the texture intensity of the image's semantic object. Theoretically, we describe the principles of OTI from the perspectives of decision boundaries as well as the mid- and high-frequency characteristics of adversarial perturbations. Comprehensive experiments demonstrate that OTI is effective and computationally efficient. In addition, our OTI provides the adversarial machine learning community with a visual understanding of attackability.
翻译:尽管神经网络取得了巨大成功,但良性图像仍可能被对抗性扰动破坏以欺骗这些模型。有趣的是,不同图像的可攻击性存在差异。具体而言,在给定攻击配置下,某些图像容易被破坏,而其他图像则更具抵抗力。评估图像可攻击性在主动学习、对抗训练和攻击增强中具有重要应用价值,这促使人们对开发可攻击性度量的兴趣日益增长。然而,现有方法数量有限且存在两大主要局限:(1)它们依赖模型代理提供先验知识(如梯度或最小扰动)以提取模型相关的图像特征。遗憾的是,在实践中,许多任务特定模型并不易于获取。(2)用于表征图像可攻击性的提取特征缺乏视觉可解释性,使其与图像的直接关系难以理解。为解决这些问题,我们提出了一种新颖的对象纹理强度(OTI)——一种无模型且视觉可解释的图像可攻击性度量,该方法通过图像语义对象的纹理强度来量化图像可攻击性。理论上,我们从决策边界以及对抗性扰动的中高频特性两个角度阐述了OTI的原理。综合实验表明,OTI不仅有效且计算高效。此外,我们的OTI为对抗性机器学习社区提供了对可攻击性的可视化理解。