Model editing aims to correct errors in large, pretrained models without altering unrelated behaviors. While some recent works have edited vision-language models (VLMs), no existing editors tackle reasoning-heavy tasks, which typically require humans and models to reason about images. We therefore propose ReasonEdit, the first VLM editor to let users explain their reasoning during editing, introducing a new, practical model editing setup. ReasonEdit continuously stores human reasoning in a codebook, and retrieves only relevant facts during inference using a novel topology-balanced multimodal embedding method inspired by network science. Across four VLMs on multiple rationale-based visual question answering datasets, ReasonEdit achieves state-of-the-art editing performance, ultimately showing that using human reasoning during editing greatly improves edit generalization.
翻译:模型编辑旨在修正大规模预训练模型中的错误,同时保持无关行为不变。尽管近期已有研究对视觉-语言模型(VLM)进行编辑,但现有编辑器均未处理需要人类与模型对图像进行复杂推理的任务。为此,我们提出ReasonEdit——首个允许用户在编辑过程中阐释推理逻辑的VLM编辑器,从而构建了一种新颖且实用的模型编辑框架。ReasonEdit通过受网络科学启发的拓扑平衡多模态嵌入方法,持续将人类推理存储于码本中,并在推理阶段仅检索相关事实。在四个VLM模型及多个基于推理的视觉问答数据集上的实验表明,ReasonEdit实现了最先进的编辑性能,最终证明编辑过程中引入人类推理能显著提升编辑泛化能力。