Catastrophic forgetting impairs the continuous learning of large language models. We propose Fisher-Guided Gradient Masking (FGGM), a framework that mitigates this by strategically selecting parameters for updates using diagonal Fisher Information. FGGM dynamically generates binary masks with adaptive thresholds, preserving critical parameters to balance stability and plasticity without requiring historical data. Unlike magnitude-based methods such as MIGU, our approach offers a mathematically principled parameter importance estimation. On the TRACE benchmark, FGGM shows a 9.6% relative improvement in retaining general capabilities over supervised fine-tuning (SFT) and a 4.4% improvement over MIGU on TRACE tasks. Additional analysis on code generation tasks confirms FGGM's superior performance and reduced forgetting, establishing it as an effective solution.
翻译:灾难性遗忘会损害大语言模型的持续学习能力。我们提出了基于Fisher信息引导的梯度掩码(FGGM)框架,该框架通过使用对角Fisher信息策略性地选择待更新参数来缓解此问题。FGGM通过自适应阈值动态生成二值掩码,在无需历史数据的条件下保留关键参数以平衡稳定性与可塑性。与MIGU等基于参数幅度的掩码方法不同,我们的方法提供了基于数学原理的参数重要性估计。在TRACE基准测试中,FGGM在保持通用能力方面相比监督微调(SFT)取得了9.6%的相对提升,在TRACE任务上较MIGU提升了4.4%。在代码生成任务上的进一步分析证实了FGGM的优越性能和更低的遗忘率,确立了其作为持续学习有效解决方案的地位。