The emergence of Deep Neural Networks (DNNs) has revolutionized various domains, enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also exposed a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models into making erroneous predictions, raising concerns for safety-critical applications. An intriguing property of this phenomenon is the transferability of adversarial examples, where perturbations crafted for one model can deceive another, often with a different architecture. This intriguing property enables "black-box" attacks, circumventing the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples. We categorize existing methodologies to enhance adversarial transferability and discuss the fundamental principles guiding each approach. While the predominant body of research primarily concentrates on image classification, we also extend our discussion to encompass other vision tasks and beyond. Challenges and future prospects are discussed, highlighting the importance of fortifying DNNs against adversarial vulnerabilities in an evolving landscape.
翻译:深度神经网络的兴起彻底改变了各个领域,使得解决图像识别、自然语言处理和科学问题求解等复杂任务成为可能。然而,这一进展也暴露出一个令人担忧的脆弱性:对抗样本。这些精心构造的输入对人眼而言难以察觉,却能操纵机器学习模型做出错误预测,从而引发安全关键任务的担忧。该现象的一个有趣特性是对抗样本的迁移性,即为一个模型生成的扰动却能欺骗另一个模型(通常采用不同的架构)。这一特性使得"黑盒"攻击成为可能,绕过了对目标模型详细知识的依赖。本综述系统梳理了对抗样本迁移性的研究现状。我们对现有增强对抗迁移性的方法进行归类,并探讨每种方法背后的基本原则。虽然现有研究主要集中于图像分类领域,但我们也将其讨论拓展至其他视觉任务及更广泛领域。最后讨论了当前挑战与未来研究方向,强调了在不断演进的环境中强化深度神经网络对抗脆弱性的重要性。