熵在视觉定位中的作用：分析与优化 (The Role of Entropy in Visual Grounding: Analysis and Optimization) - 专知论文

会员服务 ·

0

视觉定位 · 分析 · 多模 · 模态 · 策略优化 ·

2025 年 12 月 7 日

The Role of Entropy in Visual Grounding: Analysis and Optimization

翻译：熵在视觉定位中的作用：分析与优化

Shuo Li,Jiajun Sun,Zhihao Zhang,Xiaoran Fan,Senjie Jin,Hui Li,Yuming Yang,Junjie Ye,Lixing Shen,Tao Ji,Tao Gui,Qi Zhang,Xuanjing Huang

Recent advances in fine-tuning multimodal large language models (MLLMs) using reinforcement learning have achieved remarkable progress, particularly with the introduction of various entropy control techniques. However, the role and characteristics of entropy in perception-oriented tasks like visual grounding, as well as effective strategies for controlling it, remain largely unexplored. To address this issue, we focus on the visual grounding task and analyze the role and characteristics of entropy in comparison to reasoning tasks. Building on these findings, we introduce ECVGPO (Entropy Control Visual Grounding Policy Optimization), an interpretable algorithm designed for effective entropy regulation. Through entropy control, the trade-off between exploration and exploitation is better balanced. Experiments show that ECVGPO achieves broad improvements across various benchmarks and models.

翻译：近期，通过强化学习对多模态大语言模型（MLLMs）进行微调取得了显著进展，特别是随着各种熵控制技术的引入。然而，在视觉定位这类感知导向任务中，熵的作用与特性，以及控制它的有效策略，在很大程度上仍未得到充分探索。为解决这一问题，我们聚焦于视觉定位任务，并与推理任务进行比较，分析了熵的作用与特性。基于这些发现，我们提出了ECVGPO（熵控制视觉定位策略优化），这是一种为有效熵调节而设计的可解释算法。通过熵控制，探索与利用之间的权衡得到了更好的平衡。实验表明，ECVGPO在各种基准测试和模型上均实现了广泛的性能提升。

0

相关内容

视觉定位

【ICML2024】TIMEX++: 通过信息瓶颈学习时间序列解释

【ICML2024】TIMEX++: 通过信息瓶颈学习时间序列解释

专知会员服务

17+阅读 · 2024年5月16日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【ICML2022】熵因果推理:图的可辨识性

【ICML2022】熵因果推理:图的可辨识性

专知会员服务

28+阅读 · 2022年8月6日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知

21+阅读 · 2021年10月25日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

自然语言处理中的自注意力机制（Self-Attention Mechanism）

自然语言处理中的自注意力机制（Self-Attention Mechanism）

PaperWeekly

22+阅读 · 2018年3月28日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

基于组合Hodge理论的图像视频质量评价方法

国家自然科学基金

0+阅读 · 2014年12月31日

CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations

Arxiv

0+阅读 · 2025年12月30日

Memorization in 3D Shape Generation: An Empirical Study

Arxiv

0+阅读 · 2025年12月29日

SC-Net: Robust Correspondence Learning via Spatial and Cross-Channel Context

Arxiv

0+阅读 · 2025年12月29日

Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data

Arxiv

0+阅读 · 2025年12月29日

Frozen LVLMs for Micro-Video Recommendation: A Systematic Study of Feature Extraction and Fusion

Arxiv

0+阅读 · 2025年12月26日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2024】TIMEX++: 通过信息瓶颈学习时间序列解释

【ICML2024】TIMEX++: 通过信息瓶颈学习时间序列解释

专知会员服务

17+阅读 · 2024年5月16日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

23+阅读 · 2023年5月10日

【ICML2022】熵因果推理:图的可辨识性

【ICML2022】熵因果推理:图的可辨识性

专知会员服务

28+阅读 · 2022年8月6日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《思考蜂群：基础、行为、拓扑与架构、认知、未来之路》400页书籍

【伯克利博士论文】协同语言智能体

新型军备竞赛：美军旨在争夺全球无人机主导地位

《乌克兰的无人机生态系统：经验教训》28页报告

相关资讯

【Tutorial】计算机视觉中的Transformer，98页ppt

【Tutorial】计算机视觉中的Transformer，98页ppt

专知

21+阅读 · 2021年10月25日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

自然语言处理中的自注意力机制（Self-Attention Mechanism）

自然语言处理中的自注意力机制（Self-Attention Mechanism）

PaperWeekly

22+阅读 · 2018年3月28日

相关论文

CubeBench: Diagnosing Interactive, Long-Horizon Spatial Reasoning Under Partial Observations

Arxiv

0+阅读 · 2025年12月30日

Memorization in 3D Shape Generation: An Empirical Study

Arxiv

0+阅读 · 2025年12月29日

SC-Net: Robust Correspondence Learning via Spatial and Cross-Channel Context

Arxiv

0+阅读 · 2025年12月29日

Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data

Arxiv

0+阅读 · 2025年12月29日

Frozen LVLMs for Micro-Video Recommendation: A Systematic Study of Feature Extraction and Fusion

Arxiv

0+阅读 · 2025年12月26日

相关基金

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

基于组合Hodge理论的图像视频质量评价方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员