Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches to training robust models against such attacks. Unfortunately, this method is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration. By leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. To this end, we first provide convergence guarantees for adversarial coreset selection. In particular, we show that the convergence bound is directly related to how well our coresets can approximate the gradient computed over the entire training data. Motivated by our theoretical analysis, we propose using this gradient approximation error as our adversarial coreset selection objective to reduce the training set size effectively. Once built, we run adversarial training over this subset of the training data. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training. We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times while experiencing a slight degradation in the clean and robust accuracy.
翻译:神经网络易受对抗攻击的影响:向输入中添加精心设计且难以察觉的扰动即可改变其输出。对抗训练是训练模型抵御此类攻击最有效的方法之一。然而,该方法需要在每次迭代中为全部训练数据构建对抗样本,导致其训练速度远慢于标准神经网络训练。本文基于核心集选择理论,论证了通过选取训练数据的小规模子集可从根本上降低鲁棒训练的时间复杂度。为此,我们首先证明了对抗核心集选择的收敛保证,具体表明收敛界限与核心集近似全训练数据梯度的能力直接相关。基于理论分析,我们提出将梯度近似误差作为对抗核心集选择目标,从而有效缩减训练集规模。选定子集后,我们在该训练数据子集上执行对抗训练。与现有方法不同,本文方法可适配包括TRADES、$\ell_p$-PGD和感知对抗训练在内的多种训练目标。大量实验表明,本方法可将对抗训练速度提升2-3倍,同时仅轻微降低干净准确率和鲁棒准确率。