Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images. This paper introduces a novel solution, Hallucination-Aware Direct Preference Optimization (HA-DPO), which reframes the hallucination problem as a preference selection task. The model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinatory). Furthermore, this paper proposes an efficient pipeline for constructing positive~(non-hallucinatory) and negative~(hallucinatory) sample pairs, ensuring a high-quality, style-consistent dataset for robust preference learning. When applied to three mainstream multimodal models, HA-DPO significantly reduced hallucination issues and amplified the models' generalization capabilities. Notably, the MiniGPT-4 model, when enhanced with HA-DPO, demonstrated a substantial improvement: POPE accuracy rose from 51.13% to 86.13% (an absolute improvement of 35%), and the MME score surged from 932.00 to 1326.46 (a relative improvement of 42.32%). The codes, models, and datasets are made accessible at https://opendatalab.github.io/HA-DPO.

翻译：多模态大语言模型近年来取得了显著进展，但仍普遍存在被称为“幻觉问题”的缺陷，即模型生成的文本描述会不准确地描述或完全捏造关联图像中的内容。本文提出了一种新颖的解决方案——幻觉感知直接偏好优化（HA-DPO），将幻觉问题重新定义为偏好选择任务。模型在面对同一图像的两个响应（一个准确、一个存在幻觉）时，被训练为偏好无幻觉响应。此外，本文提出了一种高效构建正样本（无幻觉）和负样本（存在幻觉）对的流程，以确保数据集的高质量和风格一致性，从而支持稳健的偏好学习。当应用于三种主流多模态模型时，HA-DPO显著减少了幻觉问题并增强了模型的泛化能力。值得注意的是，经过HA-DPO增强的MiniGPT-4模型展现了大幅提升：POPE准确率从51.13%提升至86.13%（绝对提升35%），MME评分从932.00激增至1326.46（相对提升42.32%）。相关代码、模型和数据集已开源在https://opendatalab.github.io/HA-DPO。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日