Continual learning refers to the capability of a machine learning model to learn and adapt to new information, without compromising its performance on previously learned tasks. Although several studies have investigated continual learning methods for information retrieval tasks, a well-defined task formulation is still lacking, and it is unclear how typical learning strategies perform in this context. To address this challenge, a systematic task formulation of continual neural information retrieval is presented, along with a multiple-topic dataset that simulates continuous information retrieval. A comprehensive continual neural information retrieval framework consisting of typical retrieval models and continual learning strategies is then proposed. Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval and enhance performance on previously learned tasks. The results indicate that embedding-based retrieval models experience a decline in their continual learning performance as the topic shift distance and dataset volume of new tasks increase. In contrast, pretraining-based models do not show any such correlation. Adopting suitable learning strategies can mitigate the effects of topic shift and data augmentation.
翻译:持续学习指的是机器学习模型学习和适应新信息的能力,同时不损害其在先前学习任务上的性能。尽管已有若干研究探讨了信息检索任务的持续学习方法,但一个明确定义的任务表述仍然缺乏,并且典型的学习策略在此背景下的表现尚不明确。为应对这一挑战,本文提出了持续神经信息检索的系统化任务表述,以及一个模拟连续信息检索的多主题数据集。随后,提出了一个由典型检索模型和持续学习策略组成的综合性持续神经信息检索框架。实证评估表明,所提框架能够成功防止神经信息检索中的灾难性遗忘,并提升在先前学习任务上的性能。结果表明,基于嵌入的检索模型的持续学习性能会随着新任务的主题偏移距离和数据集规模的增加而下降。相比之下,基于预训练的模型未显示出任何此类相关性。采用合适的学习策略可以减轻主题偏移和数据增强的影响。