This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination, which poses substantial obstacles to their practical deployment and raises concerns regarding their reliability in real-world applications. This problem has attracted increasing attention, prompting efforts to detect and mitigate such inaccuracies. We review recent advances in identifying, evaluating, and mitigating these hallucinations, offering a detailed overview of the underlying causes, evaluation benchmarks, metrics, and strategies developed to address this issue. Additionally, we analyze the current challenges and limitations, formulating open questions that delineate potential pathways for future research. By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Through our thorough and in-depth review, we contribute to the ongoing dialogue on enhancing the robustness and reliability of MLLMs, providing valuable insights and resources for researchers and practitioners alike. Resources are available at: https://github.com/showlab/Awesome-MLLM-Hallucination.
翻译:本综述全面分析了多模态大语言模型(MLLMs),亦称大型视觉语言模型(LVLMs)中出现的幻觉现象。这些模型在多模态任务中展现了显著进步和卓越能力。尽管取得这些令人瞩目的发展,MLLMs常生成与视觉内容不一致的输出,这种称为幻觉的挑战严重阻碍了其实际部署,并引发了对现实应用可靠性的担忧。该问题已引起日益关注,推动了对这种不准确性的检测与缓解研究。我们回顾了近期在识别、评估和缓解此类幻觉方面的进展,深入梳理了其根本原因、评估基准、评价指标及应对策略。此外,我们分析了当前挑战与局限,提出了明确未来研究路径的开放性问题。通过对幻觉成因、评估基准及缓解方法进行细粒度分类与全景描绘,本综述旨在深化对MLLMs幻觉的理解,并激发该领域的进一步发展。通过详尽深入的回顾,我们为增强MLLMs的鲁棒性与可靠性提供了持续讨论的贡献,为研究人员与实践者提供了宝贵的见解与资源。资源地址:https://github.com/showlab/Awesome-MLLM-Hallucination。