Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge about how to build high-quality feedback with open-source MLLMs. In this work, we introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm. RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation for preference learning and self-feedback guidance for inference-time scaling. Extensive experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models at both preference learning and inference time. RLAIF-V 7B reduces object hallucination by 80.7\% and overall hallucination by 33.7\%. Remarkably, RLAIF-V 12B further reveals the self-alignment potential of open-source MLLMs, where the model can learn from feedback of itself to achieve super GPT-4V trustworthiness.
翻译:传统用于减少幻觉的反馈学习依赖于劳动密集型的人工标注或昂贵的专有模型,这使得学术界缺乏如何利用开源多模态大语言模型构建高质量反馈的基础知识。本研究提出RLAIF-V,一种在全开源范式下对齐多模态大语言模型的新框架。RLAIF-V从两个维度充分挖掘开源多模态大语言模型的潜力:包括为偏好学习生成高质量反馈数据,以及为推理阶段扩展提供自反馈引导。在六个基准测试上进行的自动与人工评估实验表明,RLAIF-V在偏好学习和推理阶段均显著提升了模型的可信度。RLAIF-V 7B将物体幻觉降低80.7%,整体幻觉减少33.7%。值得注意的是,RLAIF-V 12B进一步揭示了开源多模态大语言模型的自对齐潜力,该模型可通过自身反馈学习实现超越GPT-4V的可信度。