Instruction tuning effectively optimizes Large Language Models (LLMs) for downstream tasks. Due to the changing environment in real-life applications, LLMs necessitate continual task-specific adaptation without catastrophic forgetting. Considering the heavy computational cost, replay-based Continual Learning (CL) methods are the simplest and most widely used for LLMs to address the forgetting issue. However, traditional replay-based methods do not fully utilize instructions to customize the replay strategy. In this work, we propose a novel paradigm called Instruction-based Continual Learning (InsCL). InsCL dynamically replays previous data based on task similarity, calculated by Wasserstein Distance with instructions. Moreover, we further introduce an Instruction Information Metric (InsInfo) to quantify the complexity and diversity of instructions. According to InsInfo, InsCL guides the replay process more inclined to high-quality data. We conduct extensive experiments over 16 tasks with different training orders, observing consistent performance improvements of InsCL. When all tasks have been trained, InsCL achieves performance gains of 3.0 Relative Gain compared with Random Replay, and 27.96 Relative Gain compared with No Replay.
翻译:指令微调能够有效优化大语言模型在下游任务上的表现。由于现实应用环境不断变化,大语言模型需在避免灾难性遗忘的同时持续适应特定任务。考虑到计算成本高昂,基于回放的持续学习方法是解决大语言模型遗忘问题的最简方案且应用最为广泛。然而传统回放方法未能充分利用指令来定制回放策略。本研究提出一种名为基于指令的持续学习(InsCL)的新范式。InsCL通过指令计算瓦瑟斯坦距离衡量任务相似度,并据此动态回放历史数据。此外我们进一步引入指令信息量度(InsInfo)量化指令的复杂度与多样性。基于InsInfo的引导,InsCL使回放过程更倾向于高质量数据。我们在16个任务上采用不同训练顺序开展大量实验,观察到InsCL性能的持续提升。当所有任务训练完毕时,与随机回放相比,InsCL获得3.0%的相对增益;与无回放相比,获得27.96%的相对增益。