Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as the projected gradient descent (PGD), commonly take the sign function on the raw gradient before updating adversarial inputs, thereby neglecting gradient magnitude information. In this paper, we present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance, as well as its caveat. We also interpret why previous attempts of directly using raw gradients failed. Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. Specifically, we convert the constrained optimization problem into an unconstrained one, by introducing a new hidden variable of non-clipped perturbation that can move beyond the constraint. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments, outperforming PGD and other competitors in various settings, without incurring any additional computational overhead. The codes is available in https://github.com/JunjieYang97/RGD.
翻译:神经网络已在多个领域展现出成功,但其性能可能因微小的输入扰动而显著下降。因此,构造此类扰动(即对抗攻击)引起了广泛关注,其中许多攻击属于“白盒”场景,即我们可以完全访问神经网络。现有的攻击算法,如投影梯度下降(PGD),通常在更新对抗输入之前对原始梯度使用符号函数,从而忽略了梯度幅度信息。本文对基于符号的更新算法如何影响逐步骤攻击性能及其局限性进行了理论分析,同时解释了为何先前直接使用原始梯度的尝试会失败。在此基础上,我们进一步提出了一种新的原始梯度下降(RGD)算法,该算法消除了符号函数的使用。具体而言,通过引入一个不受约束的非裁剪扰动隐变量,我们将约束优化问题转化为无约束优化问题。实验广泛证明了所提出的RGD算法的有效性,在各种设置下均优于PGD及其他竞争方法,且未增加任何额外计算开销。代码可在https://github.com/JunjieYang97/RGD获取。