Large language models (LLMs) exhibit exceptional performance across various domains, yet they face critical safety concerns. Model editing has emerged as an effective approach to mitigate these issues. Existing model editing methods often focus on optimizing an information matrix that blends new and old knowledge. While effective, these approaches can be computationally expensive and may cause conflicts. In contrast, we shift our attention to Hierarchical Orthogonal Residual SprEad of the information matrix, which reduces noisy gradients and enables more stable edits from a different perspective. We demonstrate the effectiveness of our method HORSE through a clear theoretical comparison with several popular methods and extensive experiments conducted on two datasets across multiple LLMs. The results show that HORSE maintains precise massive editing across diverse scenarios. The code is available at https://github.com/XiaojieGu/HORSE
翻译:大语言模型(LLMs)在多个领域展现出卓越性能,但仍面临严峻的安全性问题。模型编辑已成为缓解这些问题的有效途径。现有模型编辑方法通常侧重于优化融合新旧知识的信息矩阵。这些方法虽有效,但计算成本较高且可能引发知识冲突。相比之下,我们将研究重点转向信息矩阵的层次正交残差扩散,该方法从不同视角降低了梯度噪声,实现了更稳定的编辑。我们通过与多种主流方法的理论对比,以及在多个LLMs的两个数据集上的大量实验,证明了所提方法HORSE的有效性。结果表明,HORSE能够在多样化场景中保持精确的大规模编辑能力。代码已发布于 https://github.com/XiaojieGu/HORSE