Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, achieving improved performance in controlled, teacher-forced evaluations. However, they still encounter challenges in real-world autoregressive generation scenarios, which greatly limit their practical applicability. Our empirical analysis reveals two issues: (1) Most methods degrade pre-trained capabilities after injecting new knowledge; (2) They may exhibit a discrepancy between stored parametric knowledge and inference-time autoregressive generation behavior. To this end, we propose EtCon, an edit-then-consolidate paradigm that couples targeted edits with post-edit consolidation. Specifically, our framework comprises two stages: (1) Targeted Proximal Supervised Fine-Tuning (TPSFT) performs a constrained targeted edit to update parametric knowledge while controlling policy drift. (2) Group Relative Policy Optimization (GRPO) consolidates the edit by aligning autoregressive trajectories with the intended fact. Extensive experiments demonstrate that our EtCon improves editing reliability and real-world generalization, while better preserving pre-trained capabilities.
翻译:知识编辑旨在无需完全重新训练的情况下更新大型语言模型(LLMs)中的特定事实。先前的研究致力于调整LLMs的知识层,在受控的教师强制评估中取得了性能提升。然而,这些方法在现实世界的自回归生成场景中仍面临挑战,这极大地限制了其实用性。我们的实证分析揭示了两个问题:(1)大多数方法在注入新知识后会损害模型的预训练能力;(2)它们可能在存储的参数化知识与推理时的自回归生成行为之间出现不一致。为此,我们提出了EtCon,一种“先编辑后巩固”的范式,它将针对性编辑与编辑后巩固相结合。具体而言,我们的框架包含两个阶段:(1)目标近端监督微调(TPSFT)执行受约束的针对性编辑,以更新参数化知识,同时控制策略漂移。(2)组相对策略优化(GRPO)通过将自回归轨迹与预期事实对齐来巩固编辑效果。大量实验表明,我们的EtCon方法提高了编辑的可靠性和现实世界的泛化能力,同时更好地保留了模型的预训练能力。