Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages. However, LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images. This can negatively impact many vision-language tasks, such as visual summarization and reasoning. To address this issue, we propose a simple yet powerful algorithm, LVLM Hallucination Revisor (LURE), to post-hoc rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions. LURE is grounded in a rigorous statistical analysis of the key factors underlying object hallucination, including co-occurrence (the frequent appearance of certain objects alongside others in images), uncertainty (objects with higher uncertainty during LVLM decoding), and object position (hallucination often appears in the later part of the generated text). LURE can also be seamlessly integrated with any LVLMs. We evaluate LURE on six open-source LVLMs, achieving a 23% improvement in general object hallucination evaluation metrics over the previous best approach. In both GPT and human evaluations, LURE consistently ranks at the top. Our data and code are available at https://github.com/YiyangZhou/LURE.

翻译：大型视觉语言模型（LVLMs）在通过人类语言理解视觉信息方面展现了卓越的能力。然而，LVLMs仍存在对象幻觉问题，即生成的描述中包含图像中实际不存在的对象。这可能会对视觉摘要和推理等许多视觉语言任务产生负面影响。为解决这一问题，我们提出了一种简单而强大的算法——LVLM幻觉修正器（LURE），通过重构幻觉较少的描述来事后纠正LVLMs中的对象幻觉。LURE基于对对象幻觉背后关键因素的严格统计分析，这些因素包括共现（某些对象在图像中频繁与其他对象同时出现）、不确定性（LVLM解码过程中不确定性较高的对象）以及对象位置（幻觉通常出现在生成文本的较后部分）。LURE还可以无缝集成到任何LVLMs中。我们在六个开源LVLMs上评估了LURE，在通用对象幻觉评估指标上比先前最佳方法提升了23%。在GPT和人工评估中，LURE始终排名最高。我们的数据和代码可在https://github.com/YiyangZhou/LURE获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/