Class incremental learning approaches are useful as they help the model to learn new information (classes) sequentially, while also retaining the previously acquired information (classes). However, it has been shown that such approaches are extremely vulnerable to the adversarial backdoor attacks, where an intelligent adversary can introduce small amount of misinformation to the model in the form of imperceptible backdoor pattern during training to cause deliberate forgetting of a specific task or class at test time. In this work, we propose a novel defensive framework to counter such an insidious attack where, we use the attacker's primary strength-hiding the backdoor pattern by making it imperceptible to humans-against it, and propose to learn a perceptible (stronger) pattern (also during the training) that can overpower the attacker's imperceptible (weaker) pattern. We demonstrate the effectiveness of the proposed defensive mechanism through various commonly used Replay-based (both generative and exact replay-based) class incremental learning algorithms using continual learning benchmark variants of CIFAR-10, CIFAR-100, and MNIST datasets. Most noteworthy, our proposed defensive framework does not assume that the attacker's target task and target class is known to the defender. The defender is also unaware of the shape, size, and location of the attacker's pattern. We show that our proposed defensive framework considerably improves the performance of class incremental learning algorithms with no knowledge of the attacker's target task, attacker's target class, and attacker's imperceptible pattern. We term our defensive framework as Adversary Aware Continual Learning (AACL).
翻译:类别增量学习方法有助于模型顺序学习新信息(新类别),同时保持先前获取的信息(旧类别)。然而,研究表明此类方法极易受到对抗性后门攻击:智能对手可在训练期间通过人类无法察觉的后门模式注入少量错误信息,导致模型在测试阶段刻意遗忘特定任务或类别。本文提出了一种新型防御框架来对抗这种隐蔽攻击——我们利用攻击者的核心优势(将后门模式隐蔽为人眼不可见)反制其自身,通过在学习过程中同时习得更强的可感知模式来压制攻击者不可感知的弱模式。基于CIFAR-10、CIFAR-100和MNIST数据集的持续学习基准变体,我们通过多种常用基于回放(含生成式回放与精确回放)的类别增量学习算法验证了所提防御机制的有效性。值得关注的是,本防御框架不假设防御者已知攻击者的目标任务与目标类别,防御者亦不知晓攻击者后门模式的形状、尺寸及位置。实验表明,在完全未知攻击者目标任务、目标类别及不可感知后门模式的条件下,所提防御框架显著提升了类别增量学习算法的性能。我们将此防御框架命名为对手感知的持续学习(AACL)。