Adversarial Training (AT) has been widely proved to be an effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs). As a variant of AT, Adversarial Robustness Distillation (ARD) has demonstrated its superior performance in improving the robustness of small student models with the guidance of large teacher models. However, both AT and ARD encounter the robust fairness problem: these models exhibit strong robustness when facing part of classes (easy class), but weak robustness when facing others (hard class). In this paper, we give an in-depth analysis of the potential factors and argue that the smoothness degree of samples' soft labels for different classes (i.e., hard class or easy class) will affect the robust fairness of DNNs from both empirical observation and theoretical analysis. Based on the above finding, we propose an Anti-Bias Soft Label Distillation (ABSLD) method to mitigate the adversarial robust fairness problem within the framework of Knowledge Distillation (KD). Specifically, ABSLD adaptively reduces the student's error risk gap between different classes to achieve fairness by adjusting the class-wise smoothness degree of samples' soft labels during the training process, and the smoothness degree of soft labels is controlled by assigning different temperatures in KD to different classes. Extensive experiments demonstrate that ABSLD outperforms state-of-the-art AT, ARD, and robust fairness methods in the comprehensive metric (Normalized Standard Deviation) of robustness and fairness.
翻译:对抗训练(AT)已被广泛证明是提升深度神经网络(DNNs)抵御对抗样本攻击的鲁棒性的有效方法。作为AT的一种变体,对抗鲁棒性蒸馏(ARD)已展现出其卓越性能,能够在大规模教师模型的指导下提升小型学生模型的鲁棒性。然而,AT与ARD均面临鲁棒公平性问题:这些模型在面对部分类别(易分类)时表现出强鲁棒性,而在面对其他类别(难分类)时则表现出弱鲁棒性。本文深入分析了潜在因素,并通过实证观察与理论分析指出,不同类别(即难分类或易分类)样本的软标签平滑度会影响DNNs的鲁棒公平性。基于上述发现,我们在知识蒸馏(KD)框架内提出了一种抗偏置软标签蒸馏(ABSLD)方法,以缓解对抗鲁棒公平性问题。具体而言,ABSLD通过在训练过程中调整样本软标签的类间平滑度,自适应地减小学生模型在不同类别间的错误风险差距以实现公平性,而软标签的平滑度则通过为KD中不同类别分配不同温度参数来控制。大量实验表明,在鲁棒性与公平性的综合指标(归一化标准差)上,ABSLD优于当前最先进的AT、ARD及鲁棒公平性方法。