Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem" where the models generate textual descriptions that contain inaccurate or non-existent content from the image. To address this issue, this paper introduces a novel strategy: Hallucination-Aware Direct Preference Optimization (HA-DPO). Our approach treats the hallucination problem as a unique preference selection issue, where the model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinating). This paper also presents an efficient process for constructing hallucination sample pairs to ensure high-quality, style-consistent pairs for stable HA-DPO training. We applied this strategy to two mainstream multimodal models, and the results showed a significant reduction in the hallucination problem and an enhancement in the models' generalization capabilities. With HA-DPO, the MiniGPT-4 model demonstrates significant advancements: POPE accuracy increases from 51.13% to 85.66% (34.5% absolute improvement), and the MME score escalates from 968.58 to 1365.76 (41% relative improvement). The code, models, and datasets will be made publicly available.
翻译:多模态大型语言模型近年来取得了显著进展,但仍普遍存在所谓的“幻觉问题”,即模型生成的文本描述包含图像中不准确或不存在的内容。为解决这一问题,本文提出了一种新颖策略:幻觉感知直接偏好优化(HA-DPO)。我们的方法将幻觉问题视为一种独特的偏好选择问题,当模型面对同一图像的两个响应(一个准确、一个产生幻觉)时,训练其倾向于选择非幻觉响应。本文还提出了一种高效构建幻觉样本对的流程,以确保生成高质量、风格一致的样本对,从而支持稳定的HA-DPO训练。我们将该策略应用于两种主流多模态模型,结果表明,幻觉问题显著减少,且模型的泛化能力得到增强。通过HA-DPO,MiniGPT-4模型取得了显著进步:POPE准确率从51.13%提升至85.66%(绝对提升34.5%),MME评分从968.58提升至1365.76(相对提升41%)。相关代码、模型和数据集将公开发布。