Although adversarial training (AT) has proven effective in enhancing the model's robustness, the recently revealed issue of fairness in robustness has not been well addressed, i.e. the robust accuracy varies significantly among different categories. In this paper, instead of uniformly evaluating the model's average class performance, we delve into the issue of robust fairness, by considering the worst-case distribution across various classes. We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL). As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model. Specifically, by taking advantage of distributional robust optimization, our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability. In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies. Extensive experiments on various image datasets validate the superior performance and efficiency of the proposed FAAL compared to other state-of-the-art methods.
翻译:尽管对抗训练(AT)已被证明能有效增强模型的鲁棒性,但近期暴露的鲁棒性公平性问题仍未得到妥善解决,即不同类别间的鲁棒准确率差异显著。本文并未统一评估模型的平均类别性能,而是通过考虑各类别间的最差分布来深入探究鲁棒公平性问题。我们提出了一种名为"公平感知对抗学习"(FAAL)的新型学习范式。作为传统AT的泛化形式,我们将对抗训练问题重新定义为一种最小-最大-最大框架,以确保所训练模型兼具鲁棒性和公平性。具体而言,通过利用分布鲁棒优化方法,我们的目标是在不同类别间寻找最差分布,并确保所求解能以高概率获得性能上界。特别地,FAAL仅需两个训练周期即可将不公平的鲁棒模型微调至公平状态,且不会牺牲整体干净准确率与鲁棒准确率。在多种图像数据集上的大量实验表明,相较于其他当前最优方法,所提出的FAAL在性能与效率方面均展现出优越性。