Adversarially robust models are locally smooth around each data sample so that small perturbations cannot drastically change model outputs. In modern systems, such smoothness is usually obtained via Adversarial Training, which explicitly enforces models to perform well on perturbed examples. In this work, we show the surprising effectiveness of instead regularizing the gradient with respect to model inputs on natural examples only. Penalizing input Gradient Norm is commonly believed to be a much inferior approach. Our analyses identify that the performance of Gradient Norm regularization critically depends on the smoothness of activation functions, and are in fact extremely effective on modern vision transformers that adopt smooth activations over piecewise linear ones (eg, ReLU), contrary to prior belief. On ImageNet-1k, Gradient Norm training achieves > 90% the performance of state-of-the-art PGD-3 Adversarial Training} (52% vs.~56%), while using only 60% computation cost of the state-of-the-art without complex adversarial optimization. Our analyses also highlight the relationship between model robustness and properties of natural input gradients, such as asymmetric sample and channel statistics. Surprisingly, we find model robustness can be significantly improved by simply regularizing its gradients to concentrate on image edges without explicit conditioning on the gradient norm.
翻译:对抗鲁棒模型在数据样本周围具有局部平滑性,使得微小扰动无法显著改变模型输出。在现代系统中,此类平滑性通常通过对抗训练获得,该方法显式强制模型在扰动样本上表现良好。本研究发现,仅对自然样本的模型输入梯度进行正则化即可产生惊人的效果。惩罚输入梯度范数通常被认为是一种远为逊色的方法。我们的分析表明,梯度范数正则化的性能关键取决于激活函数的平滑性,并且实际上在现代视觉Transformer上极为有效——这些模型采用平滑激活函数而非分段线性函数(如ReLU),这与先前的认知相反。在ImageNet-1k数据集上,梯度范数训练达到了最先进PGD-3对抗训练性能的90%以上(52%对比56%),同时仅消耗最先进方法60%的计算成本,且无需复杂的对抗优化过程。我们的分析还揭示了模型鲁棒性与自然输入梯度特性(如非对称的样本和通道统计量)之间的关联。令人惊讶的是,我们发现仅通过正则化梯度使其集中于图像边缘(无需显式约束梯度范数),即可显著提升模型鲁棒性。