Adversarial training aims to reduce the problematic susceptibility of modern neural networks to small data perturbations. Surprisingly, overfitting is a major concern in adversarial training of neural networks despite being mostly absent in standard training. We provide here theoretical evidence for this peculiar ``robust overfitting'' phenomenon. Subsequently, we advance a novel loss function which we show both theoretically as well as empirically to enjoy a certified level of robustness against data evasion and poisoning attacks while ensuring guaranteed generalization. We indicate through careful numerical experiments that our resulting holistic robust (HR) training procedure yields SOTA performance in terms of adversarial error loss. Finally, we indicate that HR training can be interpreted as a direct extension of adversarial training and comes with a negligible additional computational burden.
翻译:对抗训练旨在降低现代神经网络对微小数据扰动的有害敏感性。令人惊讶的是,尽管标准训练中过拟合现象几乎不存在,但在神经网络的对抗训练中,过拟合却是一个主要问题。我们在此为这种奇特的“鲁棒过拟合”现象提供理论证据。随后,我们提出一种新颖的损失函数,并在理论和实证上证明,该函数能在确保有保障泛化性的同时,享有对数据逃逸和投毒攻击的认证级鲁棒性。通过细致的数值实验,我们表明,我们提出的整体鲁棒(HR)训练过程在对抗误差损失方面取得了最优性能。最后,我们指出HR训练可被解释为对抗训练的直接扩展,且仅带来可忽略的额外计算负担。