Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-stakes) applications. To address the challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Specifically, RLHF-V collects human preference in the form of segment-level corrections on hallucinations, and performs dense direct preference optimization over the human feedback. Comprehensive experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors with promising data and computation efficiency. Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the hallucination rate of the base MLLM by 34.8%, outperforming the concurrent LLaVA-RLHF trained on 10k annotated data. The final model achieves state-of-the-art performance in trustworthiness among open-source MLLMs, and shows better robustness than GPT-4V in preventing hallucinations aroused from over-generalization. We open-source our code, model, and data at https://github.com/RLHF-V/RLHF-V.
翻译:多模态大语言模型(MLLMs)近年来在多模态理解、推理和交互方面展现出令人瞩目的能力。然而,现有MLLMs普遍存在严重的幻觉问题,生成的内容缺乏与相关图像的事实依据。这一问题使得现有MLLMs缺乏可信度,从而在实际应用(尤其是高风险场景)中难以落地。为解决这一挑战,我们提出RLHF-V,通过基于细粒度纠正性人类反馈的行为对齐来增强MLLM的可信度。具体而言,RLHF-V以段落级幻觉修正形式收集人类偏好,并对人类反馈执行密集直接偏好优化。在自动评估和人工评估的五项基准测试中的全面实验表明,RLHF-V能够以高效的数据和计算代价显著提升MLLM行为的可信度。值得注意的是,仅使用1.4k标注数据样本,RLHF-V便将基础MLLM的幻觉率降低了34.8%,优于使用10k标注数据训练的同期LLaVA-RLHF方法。最终模型在开源MLLM中达到可信度最优性能,并在防止因过度泛化引发的幻觉方面展现出比GPT-4V更强的鲁棒性。我们将代码、模型及数据开源至 https://github.com/RLHF-V/RLHF-V。