Energy-Regularized Sequential Model Editing on Hyperspheres

from arxiv, Accepted by ICLR 2026. The code is available at https://github.com/PlusLabNLP/SPHERE. Project page: https://www.qingyuanliu.net/sphere_projectpage/

Large language models (LLMs) require constant updates to remain aligned with evolving real-world knowledge. Model editing offers a lightweight alternative to retraining, but sequential editing often destabilizes representations and induces catastrophic forgetting. In this work, we seek to better understand and mitigate performance degradation caused by sequential editing. We hypothesize that hyperspherical uniformity, a property that maintains uniform distribution of neuron weights on a hypersphere, helps the model remain stable, retain prior knowledge, while still accommodate new updates. We use Hyperspherical Energy (HE) to quantify neuron uniformity during editing, and examine its correlation with editing performance. Empirical studies across widely used editing methods reveals a strong correlation between HE dynamics and editing performance, with editing failures consistently coinciding with high HE fluctuations. We further theoretically prove that HE dynamics impose a lower bound on the degradation of pretrained knowledge, highlighting why HE stability is crucial for knowledge retention. Motivated by these insights, we propose SPHERE (Sparse Projection for Hyperspherical Energy-Regularized Editing), an HE-driven regularization strategy that stabilizes neuron weight distributions, ultimately preserving prior knowledge while enabling reliable sequential updates. Specifically, SPHERE identifies a sparse space complementary to the principal hyperspherical directions of the pretrained weight matrices and projects new knowledge onto it, attenuating perturbations on the principal directions. Extensive experiments on LLaMA3 (8B) and Qwen2.5 (7B) show that SPHERE outperforms the best baseline in editing capability by an average of 16.41%, while most faithfully preserving general model performance, thereby offering a principled path toward reliable large-scale knowledge editing.

翻译：大型语言模型（LLMs）需要持续更新以保持与不断演变的现实世界知识对齐。模型编辑提供了一种轻量级的替代方案，避免了重新训练，但序列编辑常常会破坏表示的稳定性并引发灾难性遗忘。在本研究中，我们旨在更好地理解并缓解由序列编辑引起的性能退化。我们假设超球面均匀性——一种在超球面上保持神经元权重均匀分布的性质——有助于模型保持稳定、保留先验知识，同时仍能适应新的更新。我们使用超球面能量（HE）来量化编辑过程中的神经元均匀性，并检验其与编辑性能的相关性。对广泛使用的编辑方法进行的实证研究表明，HE动态与编辑性能之间存在强相关性，编辑失败始终与HE的高波动性同时发生。我们进一步从理论上证明了HE动态对预训练知识退化施加了一个下界，这凸显了HE稳定性对于知识保留为何至关重要。受这些见解启发，我们提出了SPHERE（用于超球面能量正则化编辑的稀疏投影），这是一种HE驱动的正则化策略，可稳定神经元权重分布，最终在实现可靠序列更新的同时保留先验知识。具体而言，SPHERE识别出一个与预训练权重矩阵的主超球面方向互补的稀疏空间，并将新知识投影到该空间上，从而减弱对主方向的扰动。在LLaMA3（8B）和Qwen2.5（7B）上进行的大量实验表明，SPHERE在编辑能力上平均优于最佳基线16.41%，同时最忠实地保持了模型的通用性能，从而为可靠的大规模知识编辑提供了一条原则性路径。