Model editing aims to correct errors in large, pretrained models without altering unrelated behaviors. While some recent works have edited vision-language models (VLMs), no existing editors tackle reasoning-heavy tasks, which typically require humans and models to reason about images.We therefore propose ReasonEdit, the first VLM editor to let users explain their reasoning during editing, introducing a new, practical model editing setup. ReasonEdit continuously stores human reasoning in a codebook, and retrieves only relevant facts during inference using a novel topology-balanced multimodal embedding method inspired by network science. Across four VLMs on multiple rationale-based visual question answering datasets, ReasonEdit achieves state-of-the-art editing performance, ultimately showing that using human reasoning during editing greatly improves edit generalization.
翻译:模型编辑旨在修正大型预训练模型中的错误,同时不改变其无关行为。尽管近期已有工作对视觉语言模型进行编辑,但尚无现有编辑器能够处理需要人类与模型对图像进行推理的重推理任务。为此,我们提出ReasonEdit——首个允许用户在编辑过程中解释其推理的视觉语言模型编辑器,从而引入了一种新颖且实用的模型编辑框架。ReasonEdit持续将人类推理存储于码本中,并借鉴网络科学思想,通过一种新颖的拓扑平衡多模态嵌入方法,在推理阶段仅检索相关事实。在多个基于原理的视觉问答数据集上对四种视觉语言模型进行的实验表明,ReasonEdit实现了最先进的编辑性能,最终证明在编辑过程中引入人类推理能显著提升编辑泛化能力。