Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective - 专知论文

会员服务 ·

0

多峰值 · MoDELS · Less · 训练数据 · 可约的 ·

2024 年 2 月 22 日

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

翻译：少即是多：从EOS决策角度缓解多模态幻觉

Zihao Yue,Liang Zhang,Qin Jin

Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they may create content that is not present in the visual inputs. In this paper, we explore a new angle of this issue: overly detailed training data hinders the model's ability to timely terminate generation, leading to continued outputs beyond visual perception limits. By investigating how the model decides to terminate generation with EOS, the special end-of-sentence token, we find that the model assesses the completeness of the entire sequence by comparing the generated text with the image. This observation suggests that the model possesses an inherent potential of making proper EOS decisions based on its visual perception to avoid overly lengthy outputs. To take advantage of such potential, we explore two methods to mitigate multimodal hallucinations: a training objective that enables the model to reduce hallucinations by learning from regular instruction data, and a data filtering strategy to prevent harmful training data from exacerbating model hallucinations. Both methods significantly improve the hallucination performance of LMMs, without requiring any additional data or knowledge.

翻译：大型多模态模型（LMMs）常面临多模态幻觉问题，即可能生成视觉输入中不存在的虚假内容。本文从新视角探索该问题：过详细的训练数据阻碍模型及时终止生成，导致输出内容超出视觉感知范围。通过研究模型如何利用特殊句尾标记EOS决策终止生成，我们发现模型通过比较已生成文本与图像来评估整个序列的完整性。这一观察表明，模型具备基于视觉感知做出恰当EOS决策的固有潜力，从而避免输出过度冗长。为发挥该潜力，我们探索两种缓解多模态幻觉的方法：一种训练目标使模型通过从常规指令数据中学习来减少幻觉，另一种数据过滤策略则防止有害训练数据加剧模型幻觉。两种方法无需额外数据或知识，即可显著改善LMMs的幻觉性能。

0

相关内容

多峰值

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

氢键(O:H-O)受激弛豫势能路径的键弛豫理论与声子谱学实验标定

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

The (R)Evolution of Multimodal Large Language Models: A Survey

Arxiv

16+阅读 · 2024年2月19日

Generative Agent-Based Social Networks for Disinformation: Research Opportunities and Open Challenges

Arxiv

57+阅读 · 2023年10月11日

A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications

Arxiv

11+阅读 · 2023年8月31日

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Arxiv

33+阅读 · 2023年2月18日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Transformers in Medical Imaging: A Survey

Arxiv

15+阅读 · 2022年1月24日

How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Arxiv

15+阅读 · 2022年1月5日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

Arxiv

11+阅读 · 2020年12月15日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

3+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

5+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

7+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

10+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

11+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

15+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

10+阅读 · 6月17日

相关VIP内容

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

The (R)Evolution of Multimodal Large Language Models: A Survey

Arxiv

16+阅读 · 2024年2月19日

Generative Agent-Based Social Networks for Disinformation: Research Opportunities and Open Challenges

Arxiv

57+阅读 · 2023年10月11日

A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications

Arxiv

11+阅读 · 2023年8月31日

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Arxiv

33+阅读 · 2023年2月18日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Transformers in Medical Imaging: A Survey

Arxiv

15+阅读 · 2022年1月24日

How Does Knowledge Graph Embedding Extrapolate to Unseen Data: a Semantic Evidence View

Arxiv

15+阅读 · 2022年1月5日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

Arxiv

11+阅读 · 2020年12月15日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

氢键(O:H-O)受激弛豫势能路径的键弛豫理论与声子谱学实验标定

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员