Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill this gap by establishing theoretical performance guarantees, which reveal how the performance of the model is bounded by training losses of previous tasks in the contrastive continual learning framework. Our theoretical explanations further support the idea that pre-training can benefit continual learning. Inspired by our theoretical analysis of these guarantees, we propose a novel contrastive continual learning algorithm called CILA, which uses adaptive distillation coefficients for different tasks. These distillation coefficients are easily computed by the ratio between average distillation losses and average contrastive losses from previous tasks. Our method shows great improvement on standard benchmarks and achieves new state-of-the-art performance.
翻译:持续学习要求学习具有动态数据分布的增量任务。迄今为止,已有研究观察到,在持续学习中采用对比损失与蒸馏损失相结合的方式进行训练能取得优异性能。然而,据我们所知,这一对比持续学习框架尚缺乏令人信服的理论解释。本工作中,我们通过建立理论性能保证填补了这一空白,揭示了在对比持续学习框架下模型性能如何受先前任务训练损失的约束。我们的理论解释进一步支持了预训练有益于持续学习的观点。受这些理论保证分析的启发,我们提出了一种名为CILA的新型对比持续学习算法,该算法为不同任务使用自适应的蒸馏系数。这些蒸馏系数可通过先前任务的平均蒸馏损失与平均对比损失之比轻松计算得出。我们的方法在标准基准测试中显示出显著改进,并实现了新的最先进性能。