Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
翻译:尽管在多模态任务中取得了显著进展,多模态大语言模型(MLLMs)仍受到幻觉这一关键问题的困扰。因此,可靠地检测MLLMs中的此类幻觉已成为模型评估和安全部署实际应用的关键环节。先前该领域的研究受限于对单一任务的狭隘关注、涵盖的幻觉类别不足以及缺乏细粒度分析。针对这些挑战,我们的工作拓展了幻觉检测的研究视野。我们提出了一个新的元评估基准MHaluBench,该基准精心设计以促进幻觉检测方法进展的评估。此外,我们揭示了一个新颖的统一多模态幻觉检测框架UNIHD,该框架利用一套辅助工具稳健地验证幻觉的发生。通过细致的评估和全面的分析,我们展示了UNIHD的有效性。我们还针对如何应用特定工具处理各类幻觉提供了策略性见解。