Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adversarial directions) and varied perturbation bounds. Our theoretical explorations suggest that the combination of adversaries and anti-adversaries (samples with anti-adversarial perturbations) in training can be more effective in achieving better fairness between classes and a better tradeoff between robustness and generalization in some typical learning scenarios (e.g., noisy label learning and imbalance learning) compared with standard adversarial training. On the basis of our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. Meta learning is utilized to optimize the combination weights. Experiments on benchmark datasets under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.
翻译:对抗训练是一种提升深度神经网络鲁棒性的有效学习技术。本研究在更一般的扰动范围下(即不同样本可具有不同的扰动方向(对抗方向与反对抗方向)及变化的扰动界限),从理论上探究了对抗训练对深度学习模型在公平性、鲁棒性和泛化性方面的影响。我们的理论探索表明,相较于标准对抗训练,在某些典型学习场景(如噪声标签学习与不平衡学习)中,将对抗者与反对抗者(即施加反对抗扰动的样本)结合训练,能在类别公平性以及鲁棒性与泛化性权衡方面取得更优效果。基于理论发现,我们提出了一种更通用的学习目标,该目标针对每个训练样本结合了不同界限的对抗者与反对抗者,并利用元学习优化组合权重。在不同学习场景下的基准数据集实验验证了我们的理论发现及所提方法的有效性。