The Web, as a rich medium of diverse content, has been constantly under the threat of malicious entities exploiting its vulnerabilities, especially with the rapid proliferation of deep learning applications in various web services. One such vulnerability, crucial to the fidelity and integrity of web content, is the susceptibility of deep neural networks to adversarial perturbations, especially concerning images - a dominant form of data on the web. In light of the recent advancements in the robustness of classifiers, we delve deep into the intricacies of adversarial training (AT) and Jacobian regularization, two pivotal defenses. Our work {is the} first carefully analyzes and characterizes these two schools of approaches, both theoretically and empirically, to demonstrate how each approach impacts the robust learning of a classifier. Next, we propose our novel Optimal Transport with Jacobian regularization method, dubbed~\SystemName, jointly incorporating the input-output Jacobian regularization into the AT by leveraging the optimal transport theory. In particular, we employ the Sliced Wasserstein (SW) distance that can efficiently push the adversarial samples' representations closer to those of clean samples, regardless of the number of classes within the dataset. The SW distance provides the adversarial samples' movement directions, which are much more informative and powerful for the Jacobian regularization. Our empirical evaluations set a new standard in the domain, with our method achieving commendable accuracies of 51.41\% on the ~\CIFAR-10 and 28.49\% on the ~\CIFAR-100 datasets under the AutoAttack metric. In a real-world demonstration, we subject images sourced from the Internet to online adversarial attacks, reinforcing the efficacy and relevance of our model in defending against sophisticated web-image perturbations.
翻译:网络作为承载多样化内容的丰富媒介,始终面临恶意实体利用其漏洞的威胁,尤其是随着深度学习应用在各种网络服务中的快速普及。其中对网络内容保真度和完整性至关重要的漏洞之一,是深度神经网络对对抗性扰动的敏感性——尤其针对作为网络主要数据形式的图像。鉴于分类器鲁棒性研究的最新进展,我们深入探究了两种关键防御策略:对抗训练(AT)与雅可比正则化。本{工作首次}从理论和实践层面系统分析与表征了这两类方法,揭示了每种方法如何影响分类器的鲁棒学习。在此基础上,我们提出一种融合最优传输与雅可比正则化的新方法(称为~\SystemName),通过最优传输理论将输入-输出雅可比正则化联合嵌入对抗训练中。具体而言,我们采用切片瓦瑟斯坦(SW)距离,该距离能有效将对抗样本的表征向干净样本推进,且不受数据集中类别数目的影响。SW距离为雅可比正则化提供了更具信息量和指导性的对抗样本移动方向。我们的实证评估为该领域树立了新标杆,在AutoAttack指标下,我们的方法在~\CIFAR-10数据集上达到51.41%的优异准确率,在~\CIFAR-100数据集上达到28.49%。在实际应用演示中,我们对源自互联网的图像实施在线对抗攻击,进一步验证了该模型在防御复杂网络图像扰动中的效能与相关性。