Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the AT becomes more overfitting to the extent that the individuals for weight averaging also suffer from overfitting and produce anomalous weight values, which causes the self-ensemble model to continue to undergo robust overfitting due to the failure in removing the weight anomalies. To solve this problem, we aim to tackle the influence of outliers in the weight space in this work and propose an easy-to-operate and effective Median-Ensemble Adversarial Training (MEAT) method to solve the robust overfitting phenomenon existing in self-ensemble defense from the source by searching for the median of the historical model weights. Experimental results show that MEAT achieves the best robustness against the powerful AutoAttack and can effectively allievate the robust overfitting. We further demonstrate that most defense methods can improve robust generalization and robustness by combining with MEAT.
翻译:自集成对抗训练方法通过集成不同训练周期下的模型(如模型权重平均)来提升模型的鲁棒性。然而,先前研究表明,对抗训练中的自集成防御方法仍存在鲁棒过拟合问题,严重影响泛化性能。经验表明,在训练后期阶段,对抗训练过程会过度拟合到一定程度,使得用于权重平均的个体模型同样出现过拟合并产生异常权重值,导致自集成模型因未能消除权重异常而持续发生鲁棒过拟合。为解决该问题,本研究致力于处理权重空间异常值的影响,提出一种易于操作且有效的中位数集成对抗训练方法,通过寻找历史模型权重的中位数,从根源上解决自集成防御中存在的鲁棒过拟合现象。实验结果表明,MEAT在抵御强效AutoAttack时展现出最优鲁棒性,并能有效缓解鲁棒过拟合问题。我们进一步证明,多数防御方法通过与MEAT结合均可提升鲁棒泛化能力与鲁棒性。