Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adversarial directions) and varied perturbation bounds. Our theoretical explorations suggest that the combination of adversaries and anti-adversaries (samples with anti-adversarial perturbations) in training can be more effective in achieving better fairness between classes and a better tradeoff between robustness and generalization in some typical learning scenarios (e.g., noisy label learning and imbalance learning) compared with standard adversarial training. On the basis of our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. Meta learning is utilized to optimize the combination weights. Experiments on benchmark datasets under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.
翻译:对抗训练是一种提升深度神经网络鲁棒性的有效学习技术。本研究在更一般的扰动范围内(不同样本可具有不同的扰动方向,即对抗方向与反对抗方向,以及不同的扰动界),理论研究了对抗训练对深度学习模型在公平性、鲁棒性和泛化性方面的影响。我们的理论探索表明,与标准对抗训练相比,在训练中结合对抗样本与反对抗样本(即施加反对抗扰动的样本)能在某些典型学习场景(如噪声标签学习和不平衡学习)中更有效地实现类别间公平性,并在鲁棒性与泛化性之间取得更优权衡。基于理论发现,我们提出了一种更通用的学习目标函数,该函数针对每个训练样本结合了具有不同边界的对抗样本与反对抗样本,并利用元学习优化组合权重。在不同学习场景下的基准数据集实验验证了我们的理论发现及所提方法的有效性。