Multimodal Large Language Models (MLLMs) have garnered significant attention recently and demonstrate outstanding capabilities in various tasks such as OCR, VQA, captioning, $\textit{etc}$. However, hallucination remains a persistent issue. While numerous methods have been proposed to mitigate hallucinations, achieving notable improvements, these methods primarily focus on mitigating hallucinations about $\textbf{object/noun-related}$ concepts. Verb concepts, crucial for understanding human actions, have been largely overlooked. In this paper, to the best of our knowledge, we are the $\textbf{first}$ to investigate the $\textbf{verb hallucination}$ phenomenon of MLLMs from various perspectives. Our findings reveal that most state-of-the-art MLLMs suffer from severe verb hallucination. To assess the effectiveness of existing mitigation methods for object concept hallucination on verb hallucination, we evaluated these methods and found that they do not effectively address verb hallucination. To address this issue, we propose a novel rich verb knowledge-based tuning method to mitigate verb hallucination. The experiment results demonstrate that our method significantly reduces hallucinations related to verbs. $\textit{Our code and data will be made publicly available}$.
翻译:多模态大语言模型(MLLMs)近年来受到广泛关注,并在OCR、VQA、图像描述等多项任务中展现出卓越能力。然而,幻觉问题依然持续存在。尽管已有众多方法被提出以缓解幻觉并取得了显著改进,但这些方法主要侧重于减轻与**物体/名词相关**概念的幻觉。对于理解人类行为至关重要的动词概念,则长期被忽视。在本文中,据我们所知,我们**首次**从多角度深入探究MLLMs中的**动词幻觉**现象。我们的研究发现,大多数先进MLLMs均存在严重的动词幻觉问题。为评估现有针对物体概念幻觉的缓解方法对动词幻觉的有效性,我们对这些方法进行了评测,发现它们并不能有效解决动词幻觉。为应对此问题,我们提出了一种新颖的基于丰富动词知识的微调方法以缓解动词幻觉。实验结果表明,我们的方法显著减少了与动词相关的幻觉。*我们的代码与数据将公开提供*。