Adversarial training integrates adversarial examples during model training to enhance robustness. However, its application in fixed dataset settings differs from real-world dynamics, where data accumulates incrementally. In this study, we investigate Adversarially Robust Class Incremental Learning (ARCIL), a method that combines adversarial robustness with incremental learning. We observe that combining incremental learning with naive adversarial training easily leads to a loss of robustness. We discover that this is attributed to the disappearance of the flatness of the loss function, a characteristic of adversarial training. To address this issue, we propose the Flatness Preserving Distillation (FPD) loss that leverages the output difference between adversarial and clean examples. Additionally, we introduce the Logit Adjustment Distillation (LAD) loss, which adapts the model's knowledge to perform well on new tasks. Experimental results demonstrate the superiority of our method over approaches that apply adversarial training to existing incremental learning methods, which provides a strong baseline for incremental learning on adversarial robustness in the future. Our method achieves AutoAttack accuracy that is 5.99\%p, 5.27\%p, and 3.90\%p higher on average than the baseline on split CIFAR-10, CIFAR-100, and Tiny ImageNet, respectively. The code will be made available.
翻译:对抗训练通过在模型训练过程中整合对抗样本来增强鲁棒性。然而,其在固定数据集上的应用与数据逐渐累积的真实场景存在差异。本研究探讨了对抗鲁棒类别增量学习(ARCIL),该方法将对抗鲁棒性与增量学习相结合。我们发现,将增量学习与朴素对抗训练结合容易导致鲁棒性损失,并指出该现象源于损失函数平坦性的消失——这是对抗训练的关键特性。为解决这一问题,我们提出平坦性保持蒸馏(FPD)损失,该损失利用对抗样本与干净样本的输出差异。此外,我们引入逻辑调整蒸馏(LAD)损失,通过适配模型知识使其在新任务上表现良好。实验结果表明,我们的方法优于将对抗训练应用于现有增量学习方法的方案,为未来鲁棒增量学习建立了强基准。在分割的CIFAR-10、CIFAR-100和Tiny ImageNet数据集上,我们的方法AutoAttack准确率平均比基线分别高出5.99%、5.27%和3.90%。相关代码将开源。