Sequential knowledge editing in large language models often causes catastrophic collapse of the model's general abilities, especially for parameter-modifying methods. Existing approaches mitigate this issue through heuristic constraints on parameter updates, yet the mechanisms underlying such degradation remain insufficiently understood. In this work, we present a spectral analysis of sequential knowledge editing and show that a model's general abilities are closely associated with dominant singular directions of pretrained weight matrices. These directions are highly sensitive to perturbations and are progressively disrupted by repeated edits, closely tracking the collapse in both editing efficacy and general performance. Building on this insight, we propose REVIVE, a plug-and-play framework that stabilizes sequential editing by explicitly preserving the dominant singular subspace. REVIVE represents parameter updates in the spectral basis of the original weights and filters components that would interfere with the protected region. Extensive experiments across multiple models and benchmarks show that REVIVE consistently improves editing efficacy while substantially preserving general abilities under long-horizon sequential editing, including extreme settings with up to 20,000 edits.
翻译:大型语言模型中的序列知识编辑常导致模型通用能力的灾难性崩溃,尤其对于参数修改方法。现有方法通过对参数更新施加启发式约束来缓解此问题,但对此类性能下降的内在机制仍缺乏充分理解。本研究提出序列知识编辑的谱分析,证明模型的通用能力与预训练权重矩阵的主导奇异方向密切相关。这些方向对扰动高度敏感,并在重复编辑过程中逐渐被破坏,其变化轨迹与编辑效能和通用性能的崩溃高度吻合。基于此发现,我们提出REVIVE——一种通过显式保护主导奇异子空间来稳定序列编辑的即插即用框架。REVIVE在原始权重的谱基中表示参数更新,并过滤可能干扰受保护区域的成分。跨多个模型与基准测试的广泛实验表明,REVIVE能持续提升编辑效能,同时在长程序列编辑(包括高达20,000次编辑的极端场景)中显著保持模型的通用能力。