Adversarial training integrates adversarial examples during model training to enhance robustness. However, its application in fixed dataset settings differs from real-world dynamics, where data accumulates incrementally. In this study, we investigate Adversarially Robust Class Incremental Learning (ARCIL), a method that combines adversarial robustness with incremental learning. We observe that combining incremental learning with naive adversarial training easily leads to a loss of robustness. We discover that this is attributed to the disappearance of the flatness of the loss function, a characteristic of adversarial training. To address this issue, we propose the Flatness Preserving Distillation (FPD) loss that leverages the output difference between adversarial and clean examples. Additionally, we introduce the Logit Adjustment Distillation (LAD) loss, which adapts the model's knowledge to perform well on new tasks. Experimental results demonstrate the superiority of our method over approaches that apply adversarial training to existing incremental learning methods, which provides a strong baseline for incremental learning on adversarial robustness in the future. Our method achieves AutoAttack accuracy that is 5.99\%p, 5.27\%p, and 3.90\%p higher on average than the baseline on split CIFAR-10, CIFAR-100, and Tiny ImageNet, respectively. The code will be made available.
翻译:对抗训练通过在模型训练过程中整合对抗样本来增强鲁棒性。然而,其在固定数据集环境中的应用与数据逐步积累的现实动态场景存在差异。在本研究中,我们探索了对抗鲁棒类别增量学习(ARCIL),这是一种结合对抗鲁棒性与增量学习的方法。我们观察到,将增量学习与朴素对抗训练结合容易导致鲁棒性丧失。我们发现,这归因于损失函数平坦性的消失——这是对抗训练的一个特性。为解决这一问题,我们提出了平坦性保持蒸馏(FPD)损失,该损失利用对抗样本与干净样本之间的输出差异。此外,我们引入了逻辑调整蒸馏(LAD)损失,该损失使模型的知识适应新任务的表现。实验结果表明,相较于将对抗训练应用于现有增量学习方法的途径,我们的方法具有优越性,这为未来对抗鲁棒性的增量学习提供了强基线。在分裂CIFAR-10、CIFAR-100和Tiny ImageNet数据集上,我们的方法平均AutoAttack准确率分别比基线高出5.99%、5.27%和3.90%。代码将公开提供。