Assigning importance weights to adversarial data has achieved great success in training adversarially robust networks under limited model capacity. However, existing instance-reweighted adversarial training (AT) methods heavily depend on heuristics and/or geometric interpretations to determine those importance weights, making these algorithms lack rigorous theoretical justification/guarantee. Moreover, recent research has shown that adversarial training suffers from a severe non-uniform robust performance across the training distribution, e.g., data points belonging to some classes can be much more vulnerable to adversarial attacks than others. To address both issues, in this paper, we propose a novel doubly-robust instance reweighted AT framework, which allows to obtain the importance weights via exploring distributionally robust optimization (DRO) techniques, and at the same time boosts the robustness on the most vulnerable examples. In particular, our importance weights are obtained by optimizing the KL-divergence regularized loss function, which allows us to devise new algorithms with a theoretical convergence guarantee. Experiments on standard classification datasets demonstrate that our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance, and at the same time improves the robustness against attacks on the weakest data points. Codes will be available soon.
翻译:为对抗样本分配重要性权重在有限模型容量下训练对抗鲁棒网络方面取得了巨大成功。然而,现有的实例重加权对抗训练(AT)方法严重依赖启发式和/或几何解释来确定这些重要性权重,导致这些算法缺乏严格的理论证明/保证。此外,近期研究表明,对抗训练在训练分布上存在严重的非均匀鲁棒性能问题,例如属于某些类别的数据点可能比其他类别更易受到对抗攻击。为解决这两个问题,本文提出了一种新颖的双重稳健实例重加权AT框架,该框架可通过探索分布鲁棒优化(DRO)技术获取重要性权重,同时增强最脆弱样本的鲁棒性。具体而言,我们的重要性权重通过优化KL散度正则化损失函数获得,这使我们能够设计具有理论收敛保证的新算法。在标准分类数据集上的实验表明,我们的方法在平均鲁棒性能方面优于相关的最先进基线方法,同时提升了针对最薄弱数据点攻击的鲁棒性。代码即将公开。