Hallucination issues continue to affect multimodal large language models (MLLMs), with existing research mainly addressing object-level or attribute-level hallucinations, neglecting the more complex relation hallucinations that require advanced reasoning. Current benchmarks for relation hallucinations lack detailed evaluation and effective mitigation, and their datasets often suffer from biases due to systematic annotation processes. To address these challenges, we introduce Reefknot, a comprehensive benchmark targeting relation hallucinations, comprising over 20,000 real-world samples. We provide a systematic definition of relation hallucinations, integrating perceptive and cognitive perspectives, and construct a relation-based corpus using the Visual Genome scene graph dataset. Our comparative evaluation reveals significant limitations in current MLLMs' ability to handle relation hallucinations. Additionally, we propose a novel confidence-based mitigation strategy, which reduces the hallucination rate by an average of 9.75% across three datasets, including Reefknot. Our work offers valuable insights for achieving trustworthy multimodal intelligence.
翻译:幻觉问题持续影响着多模态大语言模型,现有研究主要针对物体级或属性级幻觉,而忽视了需要高级推理的更复杂的关系幻觉。当前用于评估关系幻觉的基准缺乏详细的评估和有效的缓解措施,且其数据集常因系统化的标注过程而存在偏差。为应对这些挑战,我们引入了Reefknot,这是一个针对关系幻觉的综合基准,包含超过20,000个真实世界样本。我们提供了关系幻觉的系统性定义,整合了感知与认知视角,并利用Visual Genome场景图数据集构建了一个基于关系的语料库。我们的比较性评估揭示了当前MLLMs在处理关系幻觉能力上的显著局限性。此外,我们提出了一种新颖的基于置信度的缓解策略,该策略在包括Reefknot在内的三个数据集上,平均将幻觉率降低了9.75%。我们的工作为实现可信赖的多模态智能提供了宝贵的见解。