It is broadly known that deep neural networks are susceptible to being fooled by adversarial examples with perturbations imperceptible by humans. Various defenses have been proposed to improve adversarial robustness, among which adversarial training methods are most effective. However, most of these methods treat the training samples independently and demand a tremendous amount of samples to train a robust network, while ignoring the latent structural information among these samples. In this work, we propose a novel Local Structure Preserving (LSP) regularization, which aims to preserve the local structure of the input space in the learned embedding space. In this manner, the attacking effect of adversarial samples lying in the vicinity of clean samples can be alleviated. We show strong empirical evidence that with or without adversarial training, our method consistently improves the performance of adversarial robustness on several image classification datasets compared to the baselines and some state-of-the-art approaches, thus providing promising direction for future research.
翻译:众所周知,深度神经网络易被人类无法察觉的扰动所构造的对抗样本欺骗。为提升对抗鲁棒性,研究者提出了多种防御策略,其中对抗训练方法最为有效。然而,现有方法大多独立处理训练样本,需大量样本才能训练出鲁棒网络,却忽略了样本间的潜在结构信息。本文提出一种新颖的局部结构保持(LSP)正则化方法,旨在学习嵌入空间中保留输入空间的局部结构。通过该方式,可缓解位于干净样本邻域内的对抗样本的攻击效果。我们通过充分的实验证明,无论是否采用对抗训练,本方法在多个图像分类数据集上相比基线及若干当前最优方法均能持续提升对抗鲁棒性性能,为未来研究提供了极具前景的方向。