Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem" where the models generate textual descriptions that contain inaccurate or non-existent content from the image. To address this issue, this paper introduces a novel strategy: Hallucination-Aware Direct Preference Optimization (HA-DPO). Our approach treats the hallucination problem as a unique preference selection issue, where the model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinating). This paper also presents an efficient process for constructing hallucination sample pairs to ensure high-quality, style-consistent pairs for stable HA-DPO training. We applied this strategy to two mainstream multimodal models, and the results showed a significant reduction in the hallucination problem and an enhancement in the models' generalization capabilities. With HA-DPO, the MiniGPT-4 model demonstrates significant advancements: POPE accuracy increases from 51.13% to 85.66% (34.5% absolute improvement), and the MME score escalates from 968.58 to 1365.76 (41% relative improvement). The code, models, and datasets will be made publicly available.

翻译：多模态大型语言模型近年来取得了显著进展，但仍普遍存在所谓的“幻觉问题”，即模型生成的文本描述包含图像中不准确或不存在的内容。为解决这一问题，本文提出了一种新颖策略：幻觉感知直接偏好优化（HA-DPO）。我们的方法将幻觉问题视为一种独特的偏好选择问题，当模型面对同一图像的两个响应（一个准确、一个产生幻觉）时，训练其倾向于选择非幻觉响应。本文还提出了一种高效构建幻觉样本对的流程，以确保生成高质量、风格一致的样本对，从而支持稳定的HA-DPO训练。我们将该策略应用于两种主流多模态模型，结果表明，幻觉问题显著减少，且模型的泛化能力得到增强。通过HA-DPO，MiniGPT-4模型取得了显著进步：POPE准确率从51.13%提升至85.66%（绝对提升34.5%），MME评分从968.58提升至1365.76（相对提升41%）。相关代码、模型和数据集将公开发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日