Continual fine-tuning of large language models (LLMs) is becoming increasingly crucial as these models are deployed in dynamic environments where tasks and data distributions evolve over time. While strong adaptability enables rapid acquisition of new knowledge, it also exposes LLMs to catastrophic forgetting, where previously learned skills degrade during sequential training. Existing replay-based strategies, such as fixed interleaved replay, accuracy-supervised, and loss-driven scheduling, remain limited: some depend on heuristic rules and provide only partial mitigation of forgetting, while others improve performance but incur substantial computational overhead. Motivated by retention dynamics under sequential fine-tuning, we propose Memory-Inspired Sampler and Scheduler Replay (MSSR), an experience replay framework that estimates sample-level memory strength and schedules rehearsal at adaptive intervals to mitigate catastrophic forgetting while maintaining fast adaptation. Extensive experiments across three backbone models and 11 sequential tasks show that MSSR consistently outperforms state-of-the-art replay baselines, with particularly strong gains on reasoning-intensive and multiple-choice benchmarks.
翻译:随着大语言模型(LLM)被部署在任务与数据分布随时间演变的动态环境中,其持续微调变得日益关键。虽然强大的适应能力使其能够快速获取新知识,但也使LLM面临灾难性遗忘的风险——在序列化训练过程中先前习得的技能会发生退化。现有的基于回放策略的方法(如固定交错回放、准确度监督及损失驱动的调度方案)仍存在局限:部分方法依赖启发式规则且仅能部分缓解遗忘,另一些虽能提升性能却带来显著的计算开销。受序列微调下记忆保持动态特性的启发,我们提出记忆启发的采样与调度回放框架(MSSR)。该经验回放框架通过估计样本级记忆强度,并以自适应间隔调度复习过程,从而在保持快速适应能力的同时有效缓解灾难性遗忘。在三种骨干模型和11项序列任务上的大量实验表明,MSSR始终优于最先进的回放基线方法,在推理密集型与多项选择题基准测试中表现尤为突出。