Deep neural networks, particularly in vision tasks, are notably susceptible to adversarial perturbations. To overcome this challenge, developing a robust classifier is crucial. In light of the recent advancements in the robustness of classifiers, we delve deep into the intricacies of adversarial training and Jacobian regularization, two pivotal defenses. Our work is the first carefully analyzes and characterizes these two schools of approaches, both theoretically and empirically, to demonstrate how each approach impacts the robust learning of a classifier. Next, we propose our novel Optimal Transport with Jacobian regularization method, dubbed OTJR, bridging the input Jacobian regularization with the a output representation alignment by leveraging the optimal transport theory. In particular, we employ the Sliced Wasserstein distance that can efficiently push the adversarial samples' representations closer to those of clean samples, regardless of the number of classes within the dataset. The SW distance provides the adversarial samples' movement directions, which are much more informative and powerful for the Jacobian regularization. Our empirical evaluations set a new standard in the domain, with our method achieving commendable accuracies of 52.57% on CIFAR-10 and 28.3% on CIFAR-100 datasets under the AutoAttack. Further validating our model's practicality, we conducted real-world tests by subjecting internet-sourced images to online adversarial attacks. These demonstrations highlight our model's capability to counteract sophisticated adversarial perturbations, affirming its significance and applicability in real-world scenarios.
翻译:深度神经网络,特别是在视觉任务中,极易受到对抗性扰动的影响。为克服这一挑战,开发鲁棒分类器至关重要。鉴于分类器鲁棒性的最新进展,我们深入探究了两种关键防御方法——对抗训练与雅可比正则化的内在机理。本研究首次从理论与实证两个层面系统分析与刻画这两类方法,揭示了各自对分类器鲁棒学习的影响机制。随后,我们提出新型最优输运联合雅可比正则化方法(OTJR),通过利用最优输运理论,将输入雅可比正则化与输出表示对齐相融合。具体而言,我们采用切片Wasserstein距离,该距离可高效地将对抗样本的表示向干净样本的表示推近,且与数据集的类别数量无关。SW距离提供了对抗样本的运动方向,这些方向对雅可比正则化具有更强的信息性与有效性。我们的实证评估在该领域树立了新标杆:在AutoAttack攻击下,本方法在CIFAR-10数据集上达到52.57%的准确率,在CIFAR-100数据集上达到28.3%。为进一步验证模型的实用性,我们通过对互联网来源图像实施在线对抗攻击进行了真实场景测试。这些实验充分展示了本模型抵御复杂对抗性扰动的能力,证实了其在实际场景中的重要价值与适用性。