Adversarially robust classifiers possess a trait that non-robust models do not -- Perceptually Aligned Gradients (PAG). Their gradients with respect to the input align well with human perception. Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications. In this work, we focus on this trait and test whether \emph{Perceptually Aligned Gradients imply Robustness}. To this end, we develop a novel objective to directly promote PAG in training classifiers and examine whether models with such gradients are more robust to adversarial attacks. Extensive experiments on multiple datasets and architectures validate that models with aligned gradients exhibit significant robustness, exposing the surprising bidirectional connection between PAG and robustness. Lastly, we show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques.
翻译:对抗鲁棒性分类器具有非鲁棒模型所不具备的一个特性——感知对齐梯度(Perceptually Aligned Gradients, PAG)。其相对于输入的梯度与人类感知高度一致。多项研究已将PAG视为鲁棒训练的副产品,但尚未有工作将其作为独立现象加以研究或探讨其自身蕴含的意义。本文聚焦这一特性,旨在检验“感知对齐梯度是否意味着鲁棒性”。为此,我们设计了一种新型目标函数,在分类器训练中直接促进PAG,并考察具备此类梯度的模型是否对对抗攻击更具鲁棒性。在多种数据集和架构上的大量实验验证表明,具有对齐梯度的模型展现出显著的鲁棒性,揭示了PAG与鲁棒性之间令人惊讶的双向关联。最后,我们证明更好的梯度对齐能带来更强的鲁棒性,并利用这一发现提升现有对抗训练技术的鲁棒性。