Steering Where to Listen: Instruction-Based Activation Steering Redirects Temporal Attention in Large Audio-Language Models - 专知论文

会员服务 ·

0

时序 · 大型音频语言模型 · 语言模型 · 事件 · 音频信号 ·

Steering Where to Listen: Instruction-Based Activation Steering Redirects Temporal Attention in Large Audio-Language Models

翻译：引导听音位置：基于指令的激活操控重定向大型音频语言模型中的时序注意力

Tsung-En Lin,Hung-Yi Lee

Large Audio-Language Models (LALMs) excel at audio understanding but expose little about where in an audio signal they attend. We introduce instruction-based vector steering, which constructs a steering vector by contrasting activations from differently instructed prompts while keeping the audio fixed. Through a systematic probe of LALM attention, we find that - unlike standard prompting or audio-based steering - this intervention significantly redistributes the temporal attention allocated to audio tokens, concentrating it on acoustically relevant regions. We then show that this attention shift is behaviorally meaningful: in a controlled three-event setting, reading out the temporal position of maximal steering-induced attention change recovers the location of a queried sound event without any training, attaining 60.87% and 68.72% overlap with ground-truth intervals on Qwen2-Audio and Audio Flamingo 3, far above direct prompting (31.84%, 46.75%) and random baselines (27.74%). Our results characterize a mechanistic property of instruction-based steering in LALMs and provide a training-free probe for the latent temporal structure these models encode.

翻译：大型音频语言模型（LALMs）在音频理解方面表现出色，但对其在音频信号中的注意力分布却知之甚少。我们提出一种基于指令的向量操控方法，通过对比不同指令提示下的激活值（同时保持音频输入不变）来构建操控向量。通过对LALM注意力的系统探测，我们发现——与标准提示或基于音频的操控不同——这种干预显著重新分配了分配给音频令牌的时序注意力，将其集中在声学相关的区域上。我们进一步证明这种注意力转移具有行为学意义：在受控的三事件场景中，通过读取由操控引起的最大注意力变化的时间位置，可在无需任何训练的情况下恢复查询声音事件的位置，在Qwen2-Audio和Audio Flamingo 3上分别实现了60.87%和68.72%的真实区间重叠率，远高于直接提示（31.84%，46.75%）和随机基线（27.74%）。我们的结果揭示了LALMs中基于指令操控的机制特性，并提供了一种无需训练的探测方法，用于揭示这些模型编码的潜在时序结构。

0

相关内容

TransMLA：多头潜在注意力（MLA）即为所需

TransMLA：多头潜在注意力（MLA）即为所需

专知会员服务

23+阅读 · 2025年2月13日

《多模态大语言模型视觉提示》综述

《多模态大语言模型视觉提示》综述

专知会员服务

36+阅读 · 2024年9月25日

大型语言模型的高效提示方法综述

大型语言模型的高效提示方法综述

专知会员服务

75+阅读 · 2024年4月2日

大模型时代还不理解自注意力(Self-Attention)？这篇文章教你从头写代码实现

大模型时代还不理解自注意力(Self-Attention)？这篇文章教你从头写代码实现

专知会员服务

36+阅读 · 2024年2月12日

如何提示？浙大最新《大型语言模型提示框架》综述

如何提示？浙大最新《大型语言模型提示框架》综述

专知会员服务

83+阅读 · 2023年11月23日

大模型中视觉指令调优怎么做？腾讯最新《视觉-语言指令调优》综述与分析

大模型中视觉指令调优怎么做？腾讯最新《视觉-语言指令调优》综述与分析

专知会员服务

45+阅读 · 2023年11月18日

《大型语言模型指令调优》综述

《大型语言模型指令调优》综述

专知会员服务

74+阅读 · 2023年8月27日

中科大腾讯最新《多模态大型语言模型》综述，详述多模态指令微调、上下文学习、思维链和辅助视觉推理技术

中科大腾讯最新《多模态大型语言模型》综述，详述多模态指令微调、上下文学习、思维链和辅助视觉推理技术

专知会员服务

105+阅读 · 2023年6月27日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

注意力机制 | 图卷积多跳注意力机制 | Direct multi-hop Attention based GNN

注意力机制 | 图卷积多跳注意力机制 | Direct multi-hop Attention based GNN

AINLP

22+阅读 · 2020年11月29日

【论文笔记】通过自注意力网络的动态图表示学习

【论文笔记】通过自注意力网络的动态图表示学习

专知

90+阅读 · 2019年12月2日

深度学习注意力机制-Attention in Deep learning-附101页PPT

深度学习注意力机制-Attention in Deep learning-附101页PPT

专知

139+阅读 · 2019年9月23日

Attention！注意力机制模型最新综述

Attention！注意力机制模型最新综述

专知

65+阅读 · 2019年4月8日

从Seq2seq到Attention模型到Self Attention（一）

从Seq2seq到Attention模型到Self Attention（一）

量化投资与机器学习

76+阅读 · 2018年10月8日

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

专知

32+阅读 · 2018年8月14日

干货 | NLP中的self-attention【自-注意力】机制

干货 | NLP中的self-attention【自-注意力】机制

机器学习算法与Python学习

12+阅读 · 2018年4月11日

自然语言处理中的自注意力机制（Self-Attention Mechanism）

自然语言处理中的自注意力机制（Self-Attention Mechanism）

PaperWeekly

22+阅读 · 2018年3月28日

原创 | Attention Modeling for Targeted Sentiment

原创 | Attention Modeling for Targeted Sentiment

黑龙江大学自然语言处理实验室

25+阅读 · 2017年11月5日

深度学习中的注意力机制

深度学习中的注意力机制

人工智能头条

16+阅读 · 2017年11月2日

基于因子分析的会话语音说话人识别研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向CELP语音压缩域的通用隐写分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

听力损失系统双耳声源定位模型研究

国家自然科学基金

1+阅读 · 2015年12月31日

混沌时间序列Volterra建模及其在语音信号处理中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

深度学习框架下基于情境线索的视觉注意研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于实时fMRI解码与脑网络建模的听觉信息认知加工机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

精神压力下基于物理模型的变异语音生成机理探索及检测方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

双微阵列语音增强与定位方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

A Closer Look at Failure Modes in Temporal Understanding of Large Audio-Language Models

Arxiv

0+阅读 · 6月16日

EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning

Arxiv

0+阅读 · 6月13日

From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs

Arxiv

0+阅读 · 6月8日

SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

Arxiv

0+阅读 · 6月8日

TinyGiantALM: A Compact Audio-Language Model for Intent-Aware Reasoning under Resource Constraints

Arxiv

0+阅读 · 6月7日

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Arxiv

0+阅读 · 6月3日

EvA: An Evidence-First Audio Understanding Paradigm for LALMs

Arxiv

0+阅读 · 5月28日

Learning When to Think While Listening in Large Audio-Language Models

Arxiv

0+阅读 · 5月26日

A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook

Arxiv

0+阅读 · 5月18日

SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding

Arxiv

0+阅读 · 4月14日

VIP会员

文章信息

相关主题

大型音频语言模型

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

3+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

5+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

7+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

10+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

11+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

15+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

10+阅读 · 6月17日

相关VIP内容

TransMLA：多头潜在注意力（MLA）即为所需

TransMLA：多头潜在注意力（MLA）即为所需

专知会员服务

23+阅读 · 2025年2月13日

《多模态大语言模型视觉提示》综述

《多模态大语言模型视觉提示》综述

专知会员服务

36+阅读 · 2024年9月25日

大型语言模型的高效提示方法综述

大型语言模型的高效提示方法综述

专知会员服务

75+阅读 · 2024年4月2日

大模型时代还不理解自注意力(Self-Attention)？这篇文章教你从头写代码实现

大模型时代还不理解自注意力(Self-Attention)？这篇文章教你从头写代码实现

专知会员服务

36+阅读 · 2024年2月12日

如何提示？浙大最新《大型语言模型提示框架》综述

如何提示？浙大最新《大型语言模型提示框架》综述

专知会员服务

83+阅读 · 2023年11月23日

大模型中视觉指令调优怎么做？腾讯最新《视觉-语言指令调优》综述与分析

大模型中视觉指令调优怎么做？腾讯最新《视觉-语言指令调优》综述与分析

专知会员服务

45+阅读 · 2023年11月18日

《大型语言模型指令调优》综述

《大型语言模型指令调优》综述

专知会员服务

74+阅读 · 2023年8月27日

中科大腾讯最新《多模态大型语言模型》综述，详述多模态指令微调、上下文学习、思维链和辅助视觉推理技术

中科大腾讯最新《多模态大型语言模型》综述，详述多模态指令微调、上下文学习、思维链和辅助视觉推理技术

专知会员服务

105+阅读 · 2023年6月27日

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

【ICML2022】Branchformer:并行MLP-Attention架构，捕捉局部和全局上下文，用于语音识别和理解

专知会员服务

25+阅读 · 2022年7月8日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

热门VIP内容

开通专知VIP会员享更多权益服务

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

相关资讯

注意力机制 | 图卷积多跳注意力机制 | Direct multi-hop Attention based GNN

注意力机制 | 图卷积多跳注意力机制 | Direct multi-hop Attention based GNN

AINLP

22+阅读 · 2020年11月29日

【论文笔记】通过自注意力网络的动态图表示学习

【论文笔记】通过自注意力网络的动态图表示学习

专知

90+阅读 · 2019年12月2日

深度学习注意力机制-Attention in Deep learning-附101页PPT

深度学习注意力机制-Attention in Deep learning-附101页PPT

专知

139+阅读 · 2019年9月23日

Attention！注意力机制模型最新综述

Attention！注意力机制模型最新综述

专知

65+阅读 · 2019年4月8日

从Seq2seq到Attention模型到Self Attention（一）

从Seq2seq到Attention模型到Self Attention（一）

量化投资与机器学习

76+阅读 · 2018年10月8日

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

FAGAN：完全注意力机制（Full Attention）GAN，Self-attention+GAN

专知

32+阅读 · 2018年8月14日

干货 | NLP中的self-attention【自-注意力】机制

干货 | NLP中的self-attention【自-注意力】机制

机器学习算法与Python学习

12+阅读 · 2018年4月11日

自然语言处理中的自注意力机制（Self-Attention Mechanism）

自然语言处理中的自注意力机制（Self-Attention Mechanism）

PaperWeekly

22+阅读 · 2018年3月28日

原创 | Attention Modeling for Targeted Sentiment

原创 | Attention Modeling for Targeted Sentiment

黑龙江大学自然语言处理实验室

25+阅读 · 2017年11月5日

深度学习中的注意力机制

深度学习中的注意力机制

人工智能头条

16+阅读 · 2017年11月2日

相关论文

A Closer Look at Failure Modes in Temporal Understanding of Large Audio-Language Models

Arxiv

0+阅读 · 6月16日

EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning

Arxiv

0+阅读 · 6月13日

From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs

Arxiv

0+阅读 · 6月8日

SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

Arxiv

0+阅读 · 6月8日

TinyGiantALM: A Compact Audio-Language Model for Intent-Aware Reasoning under Resource Constraints

Arxiv

0+阅读 · 6月7日

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Arxiv

0+阅读 · 6月3日

EvA: An Evidence-First Audio Understanding Paradigm for LALMs

Arxiv

0+阅读 · 5月28日

Learning When to Think While Listening in Large Audio-Language Models

Arxiv

0+阅读 · 5月26日

A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook

Arxiv

0+阅读 · 5月18日

SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding

Arxiv

0+阅读 · 4月14日

相关基金

基于因子分析的会话语音说话人识别研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向CELP语音压缩域的通用隐写分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

听力损失系统双耳声源定位模型研究

国家自然科学基金

1+阅读 · 2015年12月31日

混沌时间序列Volterra建模及其在语音信号处理中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

深度学习框架下基于情境线索的视觉注意研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于实时fMRI解码与脑网络建模的听觉信息认知加工机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

精神压力下基于物理模型的变异语音生成机理探索及检测方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

双微阵列语音增强与定位方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员