While locate-then-edit knowledge editing efficiently updates knowledge encoded within Large Language Models (LLMs), a critical generalization failure mode emerges in the practical same-subject knowledge editing scenario: models fail to recall the updated knowledge when following user instructions, despite successfully recalling it in the original edited form. This paper identifies the geometric root of this generalization collapse as a fundamental conflict where the inner activation drifts induced by prompt variations exceed the model's geometric tolerance for generalization after editing. We attribute this instability to a dual pathology: (1) The joint optimization with orthogonal gradients collapses solutions into sharp minima with narrow stability, and (2) the standard covariance constraint paradoxically acts as a Covariance Trap that amplifies input perturbations. To resolve this, we introduce RoSE (Robust Same-subject Editing), which employs Isotropic Geometric Alignment to minimize representational deviation and Hierarchical Knowledge Integration to smooth the optimization landscape. Extensive experiments demonstrate that RoSE significantly improves instruction-following capabilities, laying the foundation for robust interactive parametric memory of LLM agents.
翻译:尽管定位后编辑的知识编辑方法能有效更新大语言模型内部编码的知识,但在实际同主题知识编辑场景中却出现了一种关键的泛化失效模式:模型在遵循用户指令时无法回忆起更新后的知识,尽管其在原始编辑形式下能成功调用。本文揭示了这种泛化崩溃的几何根源——提示变化引发的内部激活漂移与编辑后模型的几何泛化容限之间存在根本性冲突。我们将这种不稳定性归因于双重病理机制:(1)正交梯度联合优化使解陷入具有狭窄稳定性的尖锐极小值;(2)标准协方差约束矛盾地成为放大输入扰动的“协方差陷阱”。为解决此问题,我们提出RoSE(鲁棒同主题编辑)方法,该方法通过各向同性几何对齐最小化表征偏差,并采用分层知识集成来平滑优化景观。大量实验表明,RoSE显著提升了指令遵循能力,为构建鲁棒的LLM智能体交互式参数化记忆系统奠定了理论基础。