We propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for many applications may not be possible. To address the problem of continual ViT training, we first propose gated class-attention to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Importantly, gated class-attention does not require the task-ID during inference, which distinguishes it from other parameter isolation methods. Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our exemplar-free method obtains competitive results when compared to rehearsal based ViT methods.
翻译:我们提出了一种新的无示例类增量训练方法,用于视觉Transformer(ViT)。无示例持续学习的主要挑战在于保持学习器的可塑性,同时避免对先前学习任务造成灾难性遗忘。这通常通过示例回放实现,该方法能重新校准先前任务分类器,以应对学习新任务时出现的特征漂移。然而,示例回放需要保留先前任务的样本,这在许多应用中可能无法实现。为解决ViT持续训练问题,我们首先提出门控类注意力机制,以最小化ViT最终Transformer模块中的漂移。这种基于掩码的门控机制应用于最后一个Transformer模块的类注意力机制,能有效调节对先前任务至关重要的权重。值得注意的是,门控类注意力在推理时不要求任务标识符,这使其区别于其他参数隔离方法。其次,我们提出一种新的特征漂移补偿方法,用于在学习新任务时适应主干网络中的特征漂移。门控类注意力与级联特征漂移补偿的结合,使模型既能对新任务保持可塑性,又能限制对先前任务的遗忘。在CIFAR-100、Tiny-ImageNet和ImageNet100上的大量实验表明,我们的无示例方法与基于回放的ViT方法相比取得了具有竞争力的结果。