We have widely observed that neural networks are vulnerable to small additive perturbations to the input causing misclassification. In this paper, we focus on the $\ell_0$-bounded adversarial attacks, and aim to theoretically characterize the performance of adversarial training for an important class of truncated classifiers. Such classifiers are shown to have strong performance empirically, as well as theoretically in the Gaussian mixture model, in the $\ell_0$-adversarial setting. The main contribution of this paper is to prove a novel generalization bound for the binary classification setting with $\ell_0$-bounded adversarial perturbation that is distribution-independent. Deriving a generalization bound in this setting has two main challenges: (i) the truncated inner product which is highly non-linear; and (ii) maximization over the $\ell_0$ ball due to adversarial training is non-convex and highly non-smooth. To tackle these challenges, we develop new coding techniques for bounding the combinatorial dimension of the truncated hypothesis class.
翻译:我们广泛观察到,神经网络对输入的小幅加性扰动非常脆弱,会导致误分类。本文聚焦于 $\ell_0$ 有界对抗攻击,并旨在从理论上刻画针对一类重要截断分类器的对抗训练性能。在 $\ell_0$ 对抗场景下,此类分类器在高斯混合模型中展现出优异经验性能与理论性能。本文的主要贡献是证明了一个新颖的泛化界,该泛化界适用于 $\ell_0$ 有界对抗扰动下的二分类设置,且具有分布无关性。在此设置中推导泛化界面临两大挑战:(i) 高度非线性的截断内积;(ii) 对抗训练中在 $\ell_0$ 球上的最大化问题具有非凸且高度非光滑的特性。为应对这些挑战,我们开发了新的编码技术来界定截断假设类的组合维数。