Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are robust with regard to the low-quality training data and the potential adversarial attack at inference time simultaneously. It is for this reason that we introduce Outlier Robust Adversarial Training (ORAT) in this work. ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function. Theoretically, we show that the learning objective of ORAT satisfies the $\mathcal{H}$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss. Furthermore, we analyze its generalization ability and provide uniform convergence rates in high probability. ORAT can be optimized with a simple algorithm. Experimental evaluations on three benchmark datasets demonstrate the effectiveness and robustness of ORAT in handling outliers and adversarial attacks. Our code is available at https://github.com/discovershu/ORAT.
翻译:监督学习模型面临训练数据内在复杂性的挑战,例如离群值和少数子群体,以及推理阶段对抗样本的恶意攻击。尽管传统鲁棒学习方法和近期对抗训练方法分别针对这两类挑战进行了设计,但迄今为止,尚无研究开发出能同时应对低质量训练数据和推理阶段潜在对抗攻击的鲁棒模型。为此,本文提出离群值鲁棒对抗训练(ORAT)。ORAT基于对抗训练的双层优化框架,并采用鲁棒的基于排序的损失函数。理论上,我们证明ORAT的学习目标满足二分类中的$\mathcal{H}$一致性,从而确立其作为对抗0/1损失恰当替代函数的地位。此外,我们分析了其泛化能力,并给出了高概率下的均匀收敛速率。ORAT可通过简单算法进行优化。在三个基准数据集上的实验评估表明,ORAT在处理离群值和对抗攻击方面具有有效性与鲁棒性。我们的代码开源在 https://github.com/discovershu/ORAT。