Recently, the no-box adversarial attack, in which the attacker lacks access to the model's architecture, weights, and training data, become the most practical and challenging attack setup. However, there is an unawareness of the potential and flexibility inherent in the surrogate model selection process on no-box setting. Inspired by the burgeoning interest in utilizing foundational models to address downstream tasks, this paper adopts an innovative idea that 1) recasting adversarial attack as a downstream task. Specifically, image noise generation to meet the emerging trend and 2) introducing foundational models as surrogate models. Harnessing the concept of non-robust features, we elaborate on two guiding principles for surrogate model selection to explain why the foundational model is an optimal choice for this role. However, paradoxically, we observe that these foundational models underperform. Analyzing this unexpected behavior within the feature space, we attribute the lackluster performance of foundational models (e.g., CLIP) to their significant representational capacity and, conversely, their lack of discriminative prowess. To mitigate this issue, we propose the use of a margin-based loss strategy for the fine-tuning of foundational models on target images. The experimental results verify that our approach, which employs the basic Fast Gradient Sign Method (FGSM) attack algorithm, outstrips the performance of other, more convoluted algorithms. We conclude by advocating for the research community to consider surrogate models as crucial determinants in the effectiveness of adversarial attacks in no-box settings. The implications of our work bear relevance for improving the efficacy of such adversarial attacks and the overall robustness of AI systems.
翻译:最近,无盒对抗攻击(攻击者无法访问模型的架构、权重和训练数据)成为最实用且最具挑战性的攻击设定。然而,现有研究并未充分认识到在无盒设定中选择替代模型的潜在灵活性和作用。受利用基础模型处理下游任务的研究热潮启发,本文提出创新思路:1)将对抗性攻击重新定义为下游任务,具体而言是生成图像噪声以迎合新兴趋势;2)引入基础模型作为替代模型。借助非鲁棒特征概念,我们阐述了替代模型选择的两项指导原则,以解释为何基础模型是该角色的最优选择。然而矛盾的是,我们发现这些基础模型表现欠佳。通过分析特征空间中的这种异常行为,我们将基础模型(如CLIP)的平庸表现归因于其强大的表征能力与其判别能力的不足。为解决此问题,我们提出使用基于边界的损失策略对基础模型在目标图像上进行微调。实验结果表明,我们的方法采用基础快速梯度符号法(FGSM)攻击算法,其性能超越了其他更复杂的算法。最后,我们呼吁研究界将替代模型视为无盒设定下对抗性攻击有效性的关键决定因素。本工作的意义在于提升此类对抗性攻击的效率及AI系统的整体鲁棒性。