Adversarial training is an important topic in robust deep learning, but the community lacks attention to its practical usage. In this paper, we aim to resolve a real-world application challenge, i.e., training a model on an imbalanced and noisy dataset to achieve high clean accuracy and robustness, with our proposed Omnipotent Adversarial Training (OAT). Our strategy consists of two innovative methodologies to address the label noise and data imbalance in the training set. We first introduce an oracle into the adversarial training process to help the model learn a correct data-label conditional distribution. This carefully-designed oracle can provide correct label annotations for adversarial training. We further propose logits adjustment adversarial training to overcome the data imbalance challenge, which can help the model learn a Bayes-optimal distribution. Our comprehensive evaluation results show that OAT outperforms other baselines by more than 20% clean accuracy improvement and 10% robust accuracy improvement under the complex combinations of data imbalance and label noise scenarios. The code can be found in https://github.com/GuanlinLee/OAT.
翻译:对抗训练是鲁棒深度学习中的重要课题,但学术界对其实际应用场景的关注仍显不足。本文旨在解决一个现实应用挑战:在含噪声且类别不平衡的数据集上训练模型,同时实现高清洁准确率与鲁棒性。为此,我们提出通用对抗训练方法(Omnipotent Adversarial Training, OAT)。该策略包含两项创新技术,分别应对训练集中的标签噪声与数据不平衡问题。首先,我们在对抗训练过程中引入一个"预言机"(oracle),引导模型学习正确的数据-标签条件分布。这一精心设计的预言机可为对抗训练提供准确的标签标注。其次,我们提出对数几率调整对抗训练(logits adjustment adversarial training)以克服数据不平衡挑战,帮助模型学习贝叶斯最优分布。综合评估结果表明,在数据不平衡与标签噪声的复杂组合场景下,OAT相比其他基线方法在清洁准确率上提升超过20%,鲁棒准确率提升超过10%。代码开源地址:https://github.com/GuanlinLee/OAT。