Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
翻译:尽管在多模态任务中取得了显著进展,多模态大语言模型(MLLMs)仍面临幻觉这一关键问题的困扰。因此,对MLLMs中此类幻觉的可靠检测已成为模型评估及保障实际应用部署安全性的重要环节。此前该领域的研究受限于对单一任务的狭隘关注、涵盖的幻觉类别不完整以及缺乏细粒度分析。为应对这些挑战,本研究拓展了幻觉检测的研究视野。我们提出了一个新颖的元评估基准MHaluBench,旨在促进幻觉检测方法进展的评估。同时,我们发布了一个全新的统一多模态幻觉检测框架UNIHD,该框架利用一系列辅助工具稳健地验证幻觉的发生。通过细致的评估与全面的分析,我们展示了UNIHD的有效性。我们还针对如何应用特定工具处理各类幻觉提供了策略性见解。