Large language models have achieved remarkable success in various tasks. However, it is challenging for them to learn new tasks incrementally due to catastrophic forgetting. Existing approaches rely on experience replay, optimization constraints, or task differentiation, which encounter strict limitations in real-world scenarios. To address these issues, we propose Joint Flashback Adaptation. We first introduce flashbacks -- a limited number of prompts from old tasks -- when adapting to new tasks and constrain the deviations of the model outputs compared to the original one. We then interpolate latent tasks between flashbacks and new tasks to enable jointly learning relevant latent tasks, new tasks, and flashbacks, alleviating data sparsity in flashbacks and facilitating knowledge sharing for smooth adaptation. Our method requires only a limited number of flashbacks without access to the replay data and is task-agnostic. We conduct extensive experiments on state-of-the-art large language models across 1000+ instruction-following tasks, arithmetic reasoning tasks, and general reasoning tasks. The results demonstrate the superior performance of our method in improving generalization on new tasks and reducing forgetting in old tasks.
翻译:大型语言模型在各种任务中取得了显著成功。然而,由于灾难性遗忘问题,它们难以增量学习新任务。现有方法依赖于经验回放、优化约束或任务区分,这些方法在现实场景中面临严格限制。为解决这些问题,我们提出了联合闪回适应方法。我们首先在适应新任务时引入闪回——即来自旧任务的有限数量提示——并约束模型输出相对于原始模型的偏差。随后,我们在闪回与新任务之间插值潜在任务,以实现对相关潜在任务、新任务和闪回的联合学习,从而缓解闪回中的数据稀疏性问题,并促进知识共享以实现平滑适应。我们的方法仅需有限数量的闪回而无需访问回放数据,且具有任务无关性。我们在1000多个指令遵循任务、算术推理任务和通用推理任务上对最先进的大型语言模型进行了广泛实验。结果表明,我们的方法在提升新任务泛化能力和减少旧任务遗忘方面具有优越性能。