Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection

Recent studies have shown that large vision-language models (LVLMs) often suffer from the issue of object hallucinations (OH). To mitigate this issue, we introduce an efficient method that edits the model weights based on an unsafe subspace, which we call HalluSpace in this paper. With truthful and hallucinated text prompts accompanying the visual content as inputs, the HalluSpace can be identified by extracting the hallucinated embedding features and removing the truthful representations in LVLMs. By orthogonalizing the model weights, input features will be projected into the Null space of the HalluSpace to reduce OH, based on which we name our method Nullu. We reveal that HalluSpaces generally contain statistical bias and unimodal priors of the large language models (LLMs) applied to build LVLMs, which have been shown as essential causes of OH in previous studies. Therefore, null space projection suppresses the LLMs' priors to filter out the hallucinated features, resulting in contextually accurate outputs. Experiments show that our method can effectively mitigate OH across different LVLM families without extra inference costs and also show strong performance in general LVLM benchmarks. Code is released at \url{https://github.com/Ziwei-Zheng/Nullu}.

翻译：近期研究表明，大型视觉语言模型（LVLM）常受物体幻觉（OH）问题困扰。为缓解此问题，我们提出一种基于不安全子空间编辑模型权重的有效方法，本文将该子空间称为HalluSpace。通过将伴随视觉内容的真实文本提示与幻觉文本提示作为输入，可从LVLM中提取幻觉嵌入特征并移除真实表征，从而识别HalluSpace。通过对模型权重进行正交化处理，输入特征将被投影至HalluSpace的零空间以减少OH，据此我们将本方法命名为Nullu。我们发现HalluSpace普遍包含用于构建LVLM的大型语言模型（LLM）的统计偏差与单模态先验，而先前研究已证明这些是导致OH的关键成因。因此，零空间投影通过抑制LLM先验来过滤幻觉特征，从而生成上下文准确的输出。实验表明，本方法能有效缓解不同LVLM系列中的OH问题，且无需额外推理成本，同时在通用LVLM基准测试中表现出强劲性能。代码发布于\url{https://github.com/Ziwei-Zheng/Nullu}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日