Continual learning addresses the problem of continuously acquiring and transferring knowledge without catastrophic forgetting of old concepts. While humans achieve continual learning via diverse neurocognitive mechanisms, there is a mismatch between cognitive properties and evaluation methods of continual learning models. First, the measurement of continual learning models mostly relies on evaluation metrics at a micro-level, which cannot characterize cognitive capacities of the model. Second, the measurement is method-specific, emphasizing model strengths in one aspect while obscuring potential weaknesses in other respects. To address these issues, we propose to integrate model cognitive capacities and evaluation metrics into a unified evaluation paradigm. We first characterize model capacities via desiderata derived from cognitive properties supporting human continual learning. The desiderata concern (1) adaptability in varying lengths of task sequence; (2) sensitivity to dynamic task variations; and (3) efficiency in memory usage and training time consumption. Then we design evaluation protocols for each desideratum to assess cognitive capacities of recent continual learning models. Experimental results show that no method we consider has satisfied all the desiderata and is still far away from realizing truly continual learning. Although some methods exhibit some degree of adaptability and efficiency, no method is able to identify task relationships when encountering dynamic task variations, or achieve a trade-off in learning similarities and differences between tasks. Inspired by these results, we discuss possible factors that influence model performance in these desiderata and provide guidance for the improvement of continual learning models.
翻译:持续学习旨在解决在不遗忘旧知识的前提下持续获取和迁移知识的问题。人类通过多样化的神经认知机制实现持续学习,但当前持续学习模型的评估方法与认知特性之间存在不匹配。首先,持续学习模型的评估大多依赖于微观层面的评价指标,无法表征模型的认知能力。其次,评估方法具有特定性,仅侧重模型某方面的优势,而掩盖了其他方面的潜在不足。针对这些问题,本文提出将模型认知能力与评估指标整合为统一的评估范式。我们首先基于支撑人类持续学习的认知特性,通过期望目标来刻画模型能力。这些期望目标涉及:(1)在任务序列长度变化时的适应性;(2)对动态任务变化的敏感性;(3)内存使用与训练时间消耗的效率。随后,我们针对每个期望目标设计评估协议,以评估近期持续学习模型的认知能力。实验结果表明,所考察的所有方法均未能满足全部期望目标,距离实现真正的持续学习仍有显著差距。尽管部分方法展现出一定程度的适应性和效率,但没有任何方法能在遇到动态任务变化时识别任务关系,或在学习任务相似性与差异性之间实现平衡。基于这些结果,我们探讨了影响这些期望目标中模型性能的可能因素,并为持续学习模型的改进提供指导。