Deep neural networks are widely recognized as being vulnerable to adversarial perturbation. To overcome this challenge, developing a robust classifier is crucial. So far, two well-known defenses have been adopted to improve the learning of robust classifiers, namely adversarial training (AT) and Jacobian regularization. However, each approach behaves differently against adversarial perturbations. First, our work carefully analyzes and characterizes these two schools of approaches, both theoretically and empirically, to demonstrate how each approach impacts the robust learning of a classifier. Next, we propose our novel Optimal Transport with Jacobian regularization method, dubbed OTJR, jointly incorporating the input-output Jacobian regularization into the AT by leveraging the optimal transport theory. In particular, we employ the Sliced Wasserstein (SW) distance that can efficiently push the adversarial samples' representations closer to those of clean samples, regardless of the number of classes within the dataset. The SW distance provides the adversarial samples' movement directions, which are much more informative and powerful for the Jacobian regularization. Our extensive experiments demonstrate the effectiveness of our proposed method, which jointly incorporates Jacobian regularization into AT. Furthermore, we demonstrate that our proposed method consistently enhances the model's robustness with CIFAR-100 dataset under various adversarial attack settings, achieving up to 28.49% under AutoAttack.
翻译:深度神经网络被广泛认为易受对抗扰动影响。为克服这一挑战,开发鲁棒分类器至关重要。目前,已有两种广为人知的防御方法被采用以改进鲁棒分类器的学习,即对抗训练(AT)和雅可比正则化。然而,每种方法对抗动扰动的表现各不相同。首先,我们的工作从理论和实证两方面仔细分析并刻画了这两类方法,以展示每种方法如何影响分类器的鲁棒学习。接着,我们提出了一种新颖的最优传输与雅可比正则化相结合的方法,称为OTJR,通过利用最优传输理论将输入-输出雅可比正则化联合融入AT中。具体而言,我们采用切片Wasserstein(SW)距离,该距离能够高效地将对抗样本的表征推向与干净样本更接近,无论数据集中类别数量多少。SW距离提供了对抗样本的移动方向,这对于雅可比正则化更具信息量和有效性。我们的广泛实验证明了所提出方法的有效性,该方法将雅可比正则化与AT联合融合。此外,我们证明,所提出方法在CIFAR-100数据集上,在各种对抗攻击设定下持续增强模型鲁棒性,在AutoAttack下最高可达28.49%。