Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
翻译:尽管在多模态任务中取得了显著进展,多模态大语言模型仍受到幻觉这一关键问题的困扰。因此,可靠地检测MLLM中的此类幻觉已成为模型评估和保障实际应用部署的重要方面。该领域的先前研究受限于对单一任务的狭隘关注、所处理的幻觉类别范围不足以及缺乏细致的粒度。针对这些挑战,我们的工作拓展了幻觉检测的研究视野。我们提出了一个新颖的元评估基准MHaluBench,该基准经过精心设计,旨在促进幻觉检测方法进展的评估。此外,我们提出了一个新颖的统一多模态幻觉检测框架UNIHD,该框架利用一系列辅助工具来鲁棒地验证幻觉的发生。我们通过细致的评估和全面的分析证明了UNIHD的有效性。我们还就应用特定工具处理各类幻觉提供了策略性见解。