Model editing aims to correct inaccurate knowledge, update outdated information, and incorporate new data into Large Language Models (LLMs) without the need for retraining. This task poses challenges in lifelong scenarios where edits must be continuously applied for real-world applications. While some editors demonstrate strong robustness for lifelong editing in pure LLMs, Vision LLMs (VLLMs), which incorporate an additional vision modality, are not directly adaptable to existing LLM editors. In this paper, we propose LiveEdit, a LIfelong Vision language modEl Edit to bridge the gap between lifelong LLM editing and VLLMs. We begin by training an editing expert generator to independently produce low-rank experts for each editing instance, with the goal of correcting the relevant responses of the VLLM. A hard filtering mechanism is developed to utilize visual semantic knowledge, thereby coarsely eliminating visually irrelevant experts for input queries during the inference stage of the post-edited model. Finally, to integrate visually relevant experts, we introduce a soft routing mechanism based on textual semantic relevance to achieve multi-expert fusion. For evaluation, we establish a benchmark for lifelong VLLM editing. Extensive experiments demonstrate that LiveEdit offers significant advantages in lifelong VLLM editing scenarios. Further experiments validate the rationality and effectiveness of each module design in LiveEdit.
翻译:模型编辑旨在无需重新训练的情况下,修正大型语言模型(LLMs)中的错误知识、更新过时信息并融入新数据。在终身学习场景中,为适应实际应用需持续进行编辑,该任务面临诸多挑战。尽管部分编辑器在纯文本LLMs的终身编辑中展现出较强的鲁棒性,但引入额外视觉模态的视觉语言模型(VLLMs)无法直接适配现有的LLM编辑器。本文提出LiveEdit(终身视觉语言模型编辑器),以弥合终身LLM编辑与VLLMs之间的鸿沟。我们首先训练一个编辑专家生成器,使其能够为每个编辑实例独立生成低秩专家,以修正VLLM的相关输出响应。我们开发了一种硬筛选机制,利用视觉语义知识,在后编辑模型的推理阶段对输入查询进行视觉无关专家的粗粒度过滤。最后,为整合视觉相关专家,我们引入基于文本语义相关性的软路由机制,实现多专家融合。为进行评估,我们构建了终身VLLM编辑基准测试集。大量实验表明,LiveEdit在终身VLLM编辑场景中具有显著优势。进一步实验验证了LiveEdit中各模块设计的合理性与有效性。