Adversarial training (AT) has become an effective defense method against adversarial examples (AEs) and it is typically framed as a bi-level optimization problem. Among various AT methods, fast AT (FAT), which employs a single-step attack strategy to guide the training process, can achieve good robustness against adversarial attacks at a low cost. However, FAT methods suffer from the catastrophic overfitting problem, especially on complex tasks or with large-parameter models. In this work, we propose a FAT method termed FGSM-PCO, which mitigates catastrophic overfitting by averting the collapse of the inner optimization problem in the bi-level optimization process. FGSM-PCO generates current-stage AEs from the historical AEs and incorporates them into the training process using an adaptive mechanism. This mechanism determines an appropriate fusion ratio according to the performance of the AEs on the training model. Coupled with a loss function tailored to the training framework, FGSM-PCO can alleviate catastrophic overfitting and help the recovery of an overfitted model to effective training. We evaluate our algorithm across three models and three datasets to validate its effectiveness. Comparative empirical studies against other FAT algorithms demonstrate that our proposed method effectively addresses unresolved overfitting issues in existing algorithms.
翻译:对抗训练已成为防御对抗样本的有效方法,通常被构建为双层优化问题。在各类对抗训练方法中,快速对抗训练采用单步攻击策略指导训练过程,能以较低成本实现对对抗攻击的良好鲁棒性。然而,快速对抗训练方法存在灾难性过拟合问题,在复杂任务或大参数模型上尤为严重。本文提出一种称为FGSM-PCO的快速对抗训练方法,通过避免双层优化过程中内层优化问题的崩溃来缓解灾难性过拟合。FGSM-PCO从历史对抗样本生成当前阶段的对抗样本,并通过自适应机制将其融入训练过程。该机制根据对抗样本在训练模型上的表现确定合适的融合比例。结合专为该训练框架设计的损失函数,FGSM-PCO能够缓解灾难性过拟合,并帮助过拟合模型恢复有效训练。我们在三个模型和三个数据集上评估了所提算法的有效性。与其他快速对抗训练算法的对比实证研究表明,我们提出的方法能有效解决现有算法中尚未解决的过拟合问题。