Evaluating Object Hallucination in Large Vision-Language Models

Inspired by the superior language abilities of large language models (LLM), large vision-language models (LVLM) have been recently explored by integrating powerful LLMs for improving the performance on complex multimodal tasks. Despite the promising progress on LVLMs, we find that LVLMs suffer from the hallucination problem, i.e. they tend to generate objects that are inconsistent with the target images in the descriptions. To investigate it, this work presents the first systematic study on object hallucination of LVLMs. We conduct the evaluation experiments on several representative LVLMs, and show that they mostly suffer from severe object hallucination issue. We further discuss that the visual instructions may influence the hallucination, and find that: objects that frequently occur in the visual instructions or co-occur with the image objects, are obviously prone to be hallucinated by LVLMs. Besides, we find that existing evaluation methods might be affected by the input instructions and generation styles of LVLMs. Thus, we further design an improved evaluation method for object hallucination by proposing a polling-based query method called POPE. Experiment results demonstrate that our POPE can evaluate the object hallucination in a more stable and flexible way. Our codes and data are publicly available at https://github.com/RUCAIBox/POPE.

翻译：受大型语言模型（LLM）卓越语言能力的启发，近期通过整合强大的LLM来提升复杂多模态任务性能的大型视觉-语言模型（LVLM）得到了探索。尽管LVLM取得了令人瞩目的进展，但我们发现LVLM存在幻觉问题，即它们倾向于生成与描述中目标图像不一致的对象。为探究此问题，本文首次系统研究了LVLM的对象幻觉现象。我们针对多个代表性LVLM进行了评估实验，结果显示它们大多存在严重的对象幻觉问题。我们进一步讨论了视觉指令可能影响幻觉，并发现：在视觉指令中频繁出现或与图像对象共现的对象，明显更易被LVLM幻觉化。此外，我们发现现有评估方法可能受输入指令和LVLM生成风格的影响。因此，我们进一步设计了一种改进的对象幻觉评估方法，提出基于轮询的查询方法POPE。实验结果表明，我们的POPE能以更稳定、灵活的方式评估对象幻觉。我们的代码和数据已在https://github.com/RUCAIBox/POPE公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日