Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions. Previous hallucination evaluation studies on LVLMs have identified hallucinations in terms of objects, attributes, and relations but overlooked complex hallucinations that create an entire narrative around a fictional entity. In this paper, we introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination. We then utilize advanced LLMs to generate and filter fine grained hallucinatory data consisting of various types of hallucinations, with a particular focus on event hallucinations, laying the groundwork for integrating discriminative and generative evaluation methods within our universal evaluation framework. The proposed benchmark distinctively assesses LVLMs ability to tackle a broad spectrum of hallucinations, making it a reliable and comprehensive tool for gauging LVLMs efficacy in handling hallucinations. We will release our code and data.
翻译:大型视觉语言模型展现出卓越能力,但在处理图像与其描述之间的幻觉不一致性方面仍存在困难。先前针对LVLMs的幻觉评估研究主要识别了物体、属性和关系层面的幻觉,却忽略了围绕虚构实体构建完整叙事的复杂幻觉类型。本文提出一种细化的幻觉分类体系,引入新类别:事件幻觉。我们进而利用先进的大语言模型生成并过滤包含多种幻觉类型的细粒度幻觉数据,特别聚焦于事件幻觉,为在通用评估框架中融合判别式与生成式评估方法奠定基础。该基准能独特地评估LVLMs应对广泛幻觉谱系的能力,使其成为衡量LVLMs处理幻觉效能的可靠且全面的工具。我们将公开代码与数据。