Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls short in the quality of synthesized evaluation images and cannot assess whether models apply edited knowledge in relevant content. Therefore, we employ more reliable data collection methods to construct a new Large $\textbf{V}$ision-$\textbf{L}$anguage Model $\textbf{K}$nowledge $\textbf{E}$diting $\textbf{B}$enchmark, $\textbf{VLKEB}$, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data. We conduct experiments of different editing methods on five LVLMs, and thoroughly analyze how do they impact the models. The results reveal strengths and deficiencies of these methods and hopefully provide insights for future research. The codes and dataset are available at: $\href{https://github.com/VLKEB/VLKEB}{\text{https://github.com/VLKEB/VLKEB}}$.
翻译:近年来,大型语言模型的知识编辑受到了广泛关注。相比之下,编辑大型视觉-语言模型面临着来自多模态数据和复杂模型组件的额外挑战,且可用于LVLM编辑的数据有限。现有的LVLM编辑基准包含可靠性、局部性和通用性三项评估指标,但其合成评估图像的质量不足,且无法评估模型是否能在相关内容中应用编辑后的知识。因此,我们采用更可靠的数据收集方法,构建了一个新的大型**V**ision-**L**anguage模型**K**nowledge **E**diting **B**enchmark——**VLKEB**,并扩展了可移植性指标以实现更全面的评估。基于多模态知识图谱,我们的图像数据与知识实体绑定。这可以进一步用于提取实体相关知识,构成编辑数据的基础。我们在五个LVLM上进行了不同编辑方法的实验,并深入分析了它们对模型的影响。结果揭示了这些方法的优势与不足,有望为未来研究提供启示。代码与数据集发布于:https://github.com/VLKEB/VLKEB。