Large Language Models (LLMs) have become indispensable tools in science, technology, and society, enabling transformative advances across diverse fields. However, errors or outdated information within these models can undermine their accuracy and restrict their safe deployment. Developing efficient strategies for updating model knowledge without the expense and disruption of full retraining remains a critical challenge. Current model editing techniques frequently struggle to generalize corrections beyond narrow domains, leading to unintended consequences and limiting their practical impact. Here, we introduce a novel framework for editing LLMs, grounded in information bottleneck theory. This approach precisely compresses and isolates the essential information required for generalizable knowledge correction while minimizing disruption to unrelated model behaviors. Building upon this foundation, we present the Information Bottleneck Knowledge Editor (IBKE), which leverages compact latent representations to guide gradient-based updates, enabling robust and broadly applicable model editing. We validate IBKE's effectiveness across multiple LLM architectures and standard benchmark tasks, demonstrating state-of-the-art accuracy and improved generality and specificity of edits. These findings establish a theoretically principled and practical paradigm for open-domain knowledge editing, advancing the utility and trustworthiness of LLMs in real-world applications.
翻译:大语言模型已成为科学、技术和社会中不可或缺的工具,推动了跨领域的变革性进步。然而,这些模型内部的错误或过时信息可能削弱其准确性,并限制其安全部署。开发无需昂贵且破坏性的完整重新训练即可更新模型知识的高效策略,仍然是一个关键挑战。当前模型编辑技术往往难以将修正推广至狭窄领域之外,导致意外后果并限制其实际影响。本文提出了一种基于信息瓶颈理论的新型大语言模型编辑框架。该方法精确压缩并隔离了实现可泛化知识修正所需的关键信息,同时最大限度地减少对无关模型行为的干扰。在此基础上,我们提出了信息瓶颈知识编辑器,该编辑器利用紧凑的潜在表示来指导基于梯度的更新,从而实现鲁棒且广泛适用的模型编辑。我们在多种大语言模型架构和标准基准任务上验证了信息瓶颈知识编辑器的有效性,展示了最先进的准确性以及改进的编辑泛化性和特异性。这些发现为开放领域知识编辑建立了一个理论严谨且实用的范式,提升了大语言模型在实际应用中的效用和可信度。