Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, in this paper we design a unified framework to measure object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to conduct hallucination evaluation on (object, relation, object) triplets extracted from LVLMs' responses, and thus, could be easily generalized to different vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. We conduct comprehensive evaluations on Tri-HE and observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple yet effective training-free approach to mitigate hallucinations for LVLMs, with which, we exceed all open-sourced counterparts on Tri-HE, achieving comparable performance with the powerful GPT-4V. Our dataset and code for the reproduction of our experiments are available publicly at https://github.com/wujunjie1998/Tri-HE.
翻译:尽管大型视觉语言模型(LVLMs)在视觉语言推理方面表现出色,但其可能生成输入图像中并不存在的幻觉内容。现有大多数LVLM幻觉评估基准局限于评估与物体相关的幻觉。然而,关于两个物体之间关系的潜在幻觉(即关系幻觉)仍缺乏深入研究。为弥补这一缺陷,本文设计了一个统一框架来同步测量LVLMs中的物体幻觉与关系幻觉。该框架的核心思想是基于从LVLMs响应中提取的(物体,关系,物体)三元组进行幻觉评估,因而能够轻松推广至不同的视觉语言任务。基于此框架,我们进一步提出了Tri-HE——一个创新的三元组层级幻觉评估基准,可同时研究物体与关系幻觉。我们在Tri-HE上进行了全面评估,发现现有LVLMs中关系幻觉问题甚至比物体幻觉更为严重,这揭示了一个长期被忽视的影响LVLMs可靠性的关键问题。此外,基于研究发现,我们设计了一种简单有效的免训练方法来缓解LVLMs的幻觉问题。该方法在Tri-HE基准上超越了所有开源模型,达到了与强大GPT-4V相当的性能。本研究的实验数据集与复现代码已公开于https://github.com/wujunjie1998/Tri-HE。