The vulnerability of Convolutional Neural Networks (CNNs) to adversarial samples has recently garnered significant attention in the machine learning community. Furthermore, recent studies have unveiled the existence of universal adversarial perturbations (UAPs) that are image-agnostic and highly transferable across different CNN models. In this survey, our primary focus revolves around the recent advancements in UAPs specifically within the image classification task. We categorize UAPs into two distinct categories, i.e., noise-based attacks and generator-based attacks, thereby providing a comprehensive overview of representative methods within each category. By presenting the computational details of these methods, we summarize various loss functions employed for learning UAPs. Furthermore, we conduct a comprehensive evaluation of different loss functions within consistent training frameworks, including noise-based and generator-based. The evaluation covers a wide range of attack settings, including black-box and white-box attacks, targeted and untargeted attacks, as well as the examination of defense mechanisms. Our quantitative evaluation results yield several important findings pertaining to the effectiveness of different loss functions, the selection of surrogate CNN models, the impact of training data and data size, and the training frameworks involved in crafting universal attackers. Finally, to further promote future research on universal adversarial attacks, we provide some visualizations of the perturbations and discuss the potential research directions.
翻译:卷积神经网络(CNN)对抗样本的脆弱性近期引起了机器学习领域的广泛关注。此外,最新研究揭示了存在与图像无关且可在不同CNN模型间高度迁移的通用对抗扰动(UAP)。本综述重点聚焦图像分类任务中UAP的最新进展。我们将UAP分为两类:基于噪声的攻击和基于生成器的攻击,从而系统梳理了每类中的代表性方法。通过呈现这些方法的计算细节,我们总结了用于学习UAP的各种损失函数。此外,我们在统一的训练框架(包括基于噪声和基于生成器的方法)下对不同损失函数进行了全面评估。该评估涵盖了广泛的攻击设置,包括黑盒攻击与白盒攻击、定向攻击与非定向攻击,以及对防御机制的检验。定量评估结果揭示了以下重要发现:不同损失函数的有效性、替代CNN模型的选择、训练数据及其规模的影响,以及构建通用攻击器所用的训练框架。最后,为促进通用对抗攻击的未来研究,我们提供了扰动的可视化结果,并探讨了潜在研究方向。