Adversarial attacks on deep neural network models have seen rapid development and are extensively used to study the stability of these networks. Among various adversarial strategies, Projected Gradient Descent (PGD) is a widely adopted method in computer vision due to its effectiveness and quick implementation, making it suitable for adversarial training. In this work, we observe that in many cases, the perturbations computed using PGD predominantly affect only a portion of the singular value spectrum of the original image, suggesting that these perturbations are approximately low-rank. Motivated by this observation, we propose a variation of PGD that efficiently computes a low-rank attack. We extensively validate our method on a range of standard models as well as robust models that have undergone adversarial training. Our analysis indicates that the proposed low-rank PGD can be effectively used in adversarial training due to its straightforward and fast implementation coupled with competitive performance. Notably, we find that low-rank PGD often performs comparably to, and sometimes even outperforms, the traditional full-rank PGD attack, while using significantly less memory.
翻译:深度神经网络模型的对抗性攻击发展迅速,并被广泛用于研究这些网络的稳定性。在各种对抗性策略中,投影梯度下降(PGD)因其有效性和快速实现而成为计算机视觉领域广泛采用的方法,使其适用于对抗性训练。在这项工作中,我们观察到在许多情况下,使用PGD计算的扰动主要仅影响原始图像奇异值谱的一部分,这表明这些扰动近似为低秩。受此观察启发,我们提出了一种PGD的变体,能够高效计算低秩攻击。我们在多种标准模型以及经过对抗性训练的鲁棒模型上广泛验证了我们的方法。我们的分析表明,所提出的低秩PGD由于其简单快速的实现以及具有竞争力的性能,可以有效地用于对抗性训练。值得注意的是,我们发现低秩PGD通常与传统全秩PGD攻击表现相当,有时甚至更优,同时使用的内存显著减少。