Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks. AT involves obtaining robustness by including adversarial examples in training a classifier. Most variants of AT algorithms treat every training example equally. However, recent works have shown that better performance is achievable by treating them unequally. In addition, it has been observed that AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify. Consequently, various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set. In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks. Through extensive experiments, we show that our proposed method significantly improves over existing reweighting schemes, especially against strong white and black-box attacks.
翻译:对抗训练(Adversarial Training, AT)已被发现能显著提升深度学习分类器应对对抗攻击的鲁棒性。AT通过在训练分类器时纳入对抗样本,从而获得鲁棒性。大多数AT算法变体平等对待每个训练样本。然而,近期研究表明,对样本进行差异化处理可实现更优性能。此外,人们还观察到AT对训练集中不同类别的影响不均,并且不公平地损害了那些本身更难分类的类别对应的样本。因此,学界提出了多种重加权方案,为训练集中各样本的鲁棒损失分配不同权重。本研究提出了一种新颖的实例级重加权方案,该方案综合考虑了每个自然样本的漏洞程度,以及对抗攻击对其对抗样本造成的信息损失。通过大量实验表明,与现有重加权方案相比,我们所提出的方法提升显著,尤其在面对强白盒与黑盒攻击时表现突出。