There has been extensive evidence demonstrating that deep neural networks are vulnerable to adversarial examples, which motivates the development of defenses against adversarial attacks. Existing adversarial defenses typically improve model robustness against individual specific perturbation types (\eg, $\ell_{\infty}$-norm bounded adversarial examples). However, adversaries are likely to generate multiple types of perturbations in practice (\eg, $\ell_1$, $\ell_2$, and $\ell_{\infty}$ perturbations). Some recent methods improve model robustness against adversarial attacks in multiple $\ell_p$ balls, but their performance against each perturbation type is still far from satisfactory. In this paper, we observe that different $\ell_p$ bounded adversarial perturbations induce different statistical properties that can be separated and characterized by the statistics of Batch Normalization (BN). We thus propose Gated Batch Normalization (GBN) to adversarially train a perturbation-invariant predictor for defending multiple $\ell_p$ bounded adversarial perturbations. GBN consists of a multi-branch BN layer and a gated sub-network. Each BN branch in GBN is in charge of one perturbation type to ensure that the normalized output is aligned towards learning perturbation-invariant representation. Meanwhile, the gated sub-network is designed to separate inputs added with different perturbation types. We perform an extensive evaluation of our approach on commonly-used dataset including MNIST, CIFAR-10, and Tiny-ImageNet, and demonstrate that GBN outperforms previous defense proposals against multiple perturbation types (\ie, $\ell_1$, $\ell_2$, and $\ell_{\infty}$ perturbations) by large margins.
翻译:已有大量证据表明深度神经网络易受对抗样本攻击,这促使了对抗防御技术的发展。现有对抗防御方法通常针对特定扰动类型(如$\ell_{\infty}$范数有界对抗样本)提升模型鲁棒性,但实际场景中攻击者可能生成多种扰动类型(如$\ell_1$、$\ell_2$和$\ell_{\infty}$扰动)。近年来部分方法提升了模型在多重$\ell_p$球体下对对抗攻击的鲁棒性,但其针对每种扰动类型的防护性能仍远未达到理想水平。本文观察到不同$\ell_p$有界对抗扰动会引发可通过批归一化(BN)统计特性分离和表征的差异化统计特征,据此提出门控批归一化(GBN)方法,通过对抗训练构建扰动不变预测器以防御多重$\ell_p$有界对抗扰动。GBN包含多分支BN层和门控子网络:各BN分支分别负责特定扰动类型,确保归一化输出朝向学习扰动不变表征对齐;同时门控子网络被设计用于分离添加不同扰动类型的输入样本。我们在MNIST、CIFAR-10和Tiny-ImageNet等常用数据集上进行了全面评估,证明GBN在应对多种扰动类型(即$\ell_1$、$\ell_2$和$\ell_{\infty}$扰动)时,性能显著超越此前防御方案。