This work proposes a novel perspective on adversarial attacks by introducing the concept of sample attackability and robustness. Adversarial attacks insert small, imperceptible perturbations to the input that cause large, undesired changes to the output of deep learning models. Despite extensive research on generating adversarial attacks and building defense systems, there has been limited research on understanding adversarial attacks from an input-data perspective. We propose a deep-learning-based method for detecting the most attackable and robust samples in an unseen dataset for an unseen target model. The proposed method is based on a neural network architecture that takes as input a sample and outputs a measure of attackability or robustness. The proposed method is evaluated using a range of different models and different attack methods, and the results demonstrate its effectiveness in detecting the samples that are most likely to be affected by adversarial attacks. Understanding sample attackability can have important implications for future work in sample-selection tasks. For example in active learning, the acquisition function can be designed to select the most attackable samples, or in adversarial training, only the most attackable samples are selected for augmentation.
翻译:本文提出了一种新的对抗攻击视角,通过引入样本攻击性与鲁棒性概念。对抗攻击向输入中注入微小且不易察觉的扰动,导致深度学习模型输出产生巨大且非预期的变化。尽管关于生成对抗攻击与构建防御系统的研究已十分广泛,但从输入数据角度理解对抗攻击的研究仍较为有限。我们提出一种基于深度学习的方法,用于在未见数据集和未见目标模型中检测最具攻击性与最鲁棒的样本。该方法基于一种神经网络架构,该架构以样本为输入,输出攻击性或鲁棒性度量。我们使用多种不同模型与攻击方法对所提方法进行评估,结果表明它能有效检测最可能受对抗攻击影响的样本。理解样本攻击性对未来样本选择任务具有重要意义,例如在主动学习中,可设计获取函数以选择最具攻击性的样本;或在对抗训练中,仅选择最具攻击性的样本进行增强。