Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images. This paper introduces a novel solution, Hallucination-Aware Direct Preference Optimization (HA-DPO), which reframes the hallucination problem as a preference selection task. The model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinatory). Furthermore, this paper proposes an efficient pipeline for constructing positive~(non-hallucinatory) and negative~(hallucinatory) sample pairs, ensuring a high-quality, style-consistent dataset for robust preference learning. When applied to three mainstream multimodal models, HA-DPO significantly reduced hallucination issues and amplified the models' generalization capabilities. Notably, the MiniGPT-4 model, when enhanced with HA-DPO, demonstrated a substantial improvement: POPE accuracy rose from 51.13% to 86.13% (an absolute improvement of 35%), and the MME score surged from 932.00 to 1326.46 (a relative improvement of 42.32%). The codes, models, and datasets are made accessible at https://opendatalab.github.io/HA-DPO.
翻译:多模态大语言模型近年来取得了显著进展,但仍普遍存在被称为“幻觉问题”的缺陷,即模型生成的文本描述会不准确地描述或完全捏造关联图像中的内容。本文提出了一种新颖的解决方案——幻觉感知直接偏好优化(HA-DPO),将幻觉问题重新定义为偏好选择任务。模型在面对同一图像的两个响应(一个准确、一个存在幻觉)时,被训练为偏好无幻觉响应。此外,本文提出了一种高效构建正样本(无幻觉)和负样本(存在幻觉)对的流程,以确保数据集的高质量和风格一致性,从而支持稳健的偏好学习。当应用于三种主流多模态模型时,HA-DPO显著减少了幻觉问题并增强了模型的泛化能力。值得注意的是,经过HA-DPO增强的MiniGPT-4模型展现了大幅提升:POPE准确率从51.13%提升至86.13%(绝对提升35%),MME评分从932.00激增至1326.46(相对提升42.32%)。相关代码、模型和数据集已开源在https://opendatalab.github.io/HA-DPO。