Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns - 专知论文

会员服务 ·

0

词元 · Transformer · 监督 · 系统 · 金融 ·

Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns

翻译：基于Transformer注意力模式的监督化词元归因学习

Explainable AI (XAI) has become critical as transformer-based models are deployed in high-stakes applications including healthcare, legal systems, and financial services, where opacity hinders trust and accountability. Transformers self-attention mechanisms have proven valuable for model interpretability, with attention weights successfully used to understand model focus and behavior (Xu et al., 2015); (Wiegreffe and Pinter, 2019). However, existing attention-based explanation methods rely on manually defined aggregation strategies and fixed attribution rules (Abnar and Zuidema, 2020a); (Chefer et al., 2021), while model-agnostic approaches (LIME, SHAP) treat the model as a black box and incur significant computational costs through input perturbation. We introduce Explanation Network (ExpNet), a lightweight neural network that learns an explicit mapping from transformer attention patterns to token-level importance scores. Unlike prior methods, ExpNet discovers optimal attention feature combinations automatically rather than relying on predetermined rules. We evaluate ExpNet in a challenging cross-task setting and benchmark it against a broad spectrum of model-agnostic methods and attention-based techniques spanning four methodological families.

翻译：随着基于Transformer的模型被部署于医疗、司法系统和金融服务等高风险应用领域，可解释人工智能（XAI）已变得至关重要——模型的不透明性会阻碍信任建立与责任追溯。Transformer的自注意力机制已被证明对模型可解释性具有重要价值，注意力权重已成功用于理解模型的关注焦点与行为特征（Xu等人，2015；Wiegreffe与Pinter，2019）。然而，现有基于注意力的解释方法依赖于人工定义的聚合策略与固定归因规则（Abnar与Zuidema，2020a；Chefer等人，2021），而模型无关方法（如LIME、SHAP）将模型视为黑箱，并通过输入扰动产生高昂计算成本。本文提出解释网络（ExpNet），这是一种轻量级神经网络，可学习从Transformer注意力模式到词元级重要性评分的显式映射。与先前方法不同，ExpNet能自动发现最优的注意力特征组合，而非依赖预设规则。我们在具有挑战性的跨任务场景中评估ExpNet，并将其与涵盖四大方法体系的模型无关方法及基于注意力的技术进行基准比较。

0

相关内容

可解释强化学习综述：目标、方法与需求

可解释强化学习综述：目标、方法与需求

专知会员服务

31+阅读 · 2025年7月19日

《视觉Transformers自监督学习机制综述》

《视觉Transformers自监督学习机制综述》

专知会员服务

29+阅读 · 2024年9月2日

可解释人工智能中基于梯度的特征归因技术综述

可解释人工智能中基于梯度的特征归因技术综述

专知会员服务

29+阅读 · 2024年3月20日

MILA等最新《强化学习Transformer模型》综述，详述表征学习、奖励建模、转换函数建模和策略学习等技术

MILA等最新《强化学习Transformer模型》综述，详述表征学习、奖励建模、转换函数建模和策略学习等技术

专知会员服务

61+阅读 · 2023年7月16日

【柏林工业大学博士论文】可解释结构化机器学习:对相似性、图和Transformer模型的洞察，143页pdf

【柏林工业大学博士论文】可解释结构化机器学习:对相似性、图和Transformer模型的洞察，143页pdf

专知会员服务

46+阅读 · 2023年2月28日

【ICML2022】Transformer是元强化学习器

【ICML2022】Transformer是元强化学习器

专知会员服务

56+阅读 · 2022年6月15日

【NeurIPS 2021】流形上的注意力机制：规范等变的Transformer

【NeurIPS 2021】流形上的注意力机制：规范等变的Transformer

专知会员服务

30+阅读 · 2021年12月2日

最新《注意力机制与深度学习结合》综述论文

最新《注意力机制与深度学习结合》综述论文

专知会员服务

76+阅读 · 2021年6月17日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

【干货书】《Transformers 机器学习:深度探究》，284页pdf

【干货书】《Transformers 机器学习:深度探究》，284页pdf

专知

72+阅读 · 2022年4月21日

「因果推理」概述论文，13页pdf

「因果推理」概述论文，13页pdf

专知

16+阅读 · 2021年3月20日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知

14+阅读 · 2020年8月30日

深度学习的下一步：Transformer和注意力机制

深度学习的下一步：Transformer和注意力机制

云头条

56+阅读 · 2019年9月14日

谷歌NIPS论文Transformer模型解读：只要Attention就够了

谷歌NIPS论文Transformer模型解读：只要Attention就够了

AI100

14+阅读 · 2019年9月9日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

54+阅读 · 2019年4月12日

Attention！注意力机制模型最新综述

Attention！注意力机制模型最新综述

中国人工智能学会

18+阅读 · 2019年4月8日

让智能体主动交互，DeepMind提出用元强化学习实现因果推理

让智能体主动交互，DeepMind提出用元强化学习实现因果推理

机器之心

17+阅读 · 2019年2月11日

从Seq2seq到Attention模型到Self Attention（一）

从Seq2seq到Attention模型到Self Attention（一）

量化投资与机器学习

76+阅读 · 2018年10月8日

深度学习中的注意力机制

深度学习中的注意力机制

CSDN大数据

24+阅读 · 2017年11月2日

基于深度学习的复杂场景下人体行为识别研究

国家自然科学基金

9+阅读 · 2015年12月31日

基于图的半监督学习算法研究

国家自然科学基金

5+阅读 · 2015年12月31日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

面向推荐系统中异构隐式反馈建模的迁移学习技术研究

国家自然科学基金

5+阅读 · 2015年12月31日

基于深度表达和迁移学习的人体检测研究

国家自然科学基金

6+阅读 · 2015年12月31日

视知觉学习中的脑功能网络变化及其与学习效果的关系

国家自然科学基金

0+阅读 · 2015年12月31日

基于记忆的不变图像特征学习方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

深度学习框架下基于情境线索的视觉注意研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于深度学习的特征融合在移动机器人视觉中的场景理解及研究

国家自然科学基金

12+阅读 · 2014年12月31日

基于逆向强化学习和人工智能的移动机器人自主学习方法研究

国家自然科学基金

12+阅读 · 2013年12月31日

Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning

Arxiv

0+阅读 · 2月11日

Momentum Attention: The Physics of In-Context Learning and Spectral Forensics for Mechanistic Interpretability

Arxiv

0+阅读 · 2月7日

Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures

Arxiv

0+阅读 · 2月7日

Echo State Transformer: Attention Over Finite Memories

Arxiv

0+阅读 · 2月6日

Attention Retention for Continual Learning with Vision Transformers

Arxiv

0+阅读 · 2月5日

Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting

Arxiv

0+阅读 · 2月5日

From independent patches to coordinated attention: Controlling information flow in vision transformers

Arxiv

0+阅读 · 2月4日

Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning

Arxiv

0+阅读 · 2月2日

Calibration Attention: Learning Reliability-Aware Representations for Vision Transformers

Arxiv

0+阅读 · 1月19日

A Survey on Transformers in Reinforcement Learning

Arxiv

31+阅读 · 2023年1月8日

VIP会员

文章信息

相关主题

最新内容

《COOL模型（行动循环圈）：军事领导体系中的战役层级变革流程》

《COOL模型（行动循环圈）：军事领导体系中的战役层级变革流程》

专知会员服务

8+阅读 · 4月20日

《系统簇式多域作战规划范畴论框架》

《系统簇式多域作战规划范畴论框架》

专知会员服务

5+阅读 · 4月20日

《美国防部指令6130.03，第2卷服役医疗标准：保留》

《美国防部指令6130.03，第2卷服役医疗标准：保留》

专知会员服务

3+阅读 · 4月20日

《美国防部指令6130.03，第1卷服役医疗标准：任命、征募或征召》

《美国防部指令6130.03，第1卷服役医疗标准：任命、征募或征召》

专知会员服务

2+阅读 · 4月20日

美空军“战场机载通信节点（BACN）”：美以对伊空战行动中隐形却关键的一环

美空军“战场机载通信节点（BACN）”：美以对伊空战行动中隐形却关键的一环

专知会员服务

3+阅读 · 4月20日

【CMU博士论文】面向非结构化环境下医疗急救的具身人工智能

【CMU博士论文】面向非结构化环境下医疗急救的具身人工智能

专知会员服务

2+阅读 · 4月20日

高效视频扩散模型：进展与挑战

高效视频扩散模型：进展与挑战

专知会员服务

2+阅读 · 4月20日

乌克兰前线的五项创新

乌克兰前线的五项创新

专知会员服务

7+阅读 · 4月20日

军事通信系统与设备的技术演进综述

军事通信系统与设备的技术演进综述

专知会员服务

5+阅读 · 4月20日

《北约 AI手册：作战人员的实用考量》（2026最新64页）

《北约 AI手册：作战人员的实用考量》（2026最新64页）

专知会员服务

10+阅读 · 4月20日

《北约标准：医疗评估手册》174页

《北约标准：医疗评估手册》174页

专知会员服务

5+阅读 · 4月20日

《提升生成模型的安全性与保障》博士论文

《提升生成模型的安全性与保障》博士论文

专知会员服务

5+阅读 · 4月20日

美国当前高超音速导弹发展概述

美国当前高超音速导弹发展概述

专知会员服务

4+阅读 · 4月19日

《高超音速武器：一项再度兴起的技术》120页slides

《高超音速武器：一项再度兴起的技术》120页slides

专知会员服务

15+阅读 · 4月19日

无人机蜂群建模与仿真方法

无人机蜂群建模与仿真方法

专知会员服务

14+阅读 · 4月19日

相关VIP内容

可解释强化学习综述：目标、方法与需求

可解释强化学习综述：目标、方法与需求

专知会员服务

31+阅读 · 2025年7月19日

《视觉Transformers自监督学习机制综述》

《视觉Transformers自监督学习机制综述》

专知会员服务

29+阅读 · 2024年9月2日

可解释人工智能中基于梯度的特征归因技术综述

可解释人工智能中基于梯度的特征归因技术综述

专知会员服务

29+阅读 · 2024年3月20日

MILA等最新《强化学习Transformer模型》综述，详述表征学习、奖励建模、转换函数建模和策略学习等技术

MILA等最新《强化学习Transformer模型》综述，详述表征学习、奖励建模、转换函数建模和策略学习等技术

专知会员服务

61+阅读 · 2023年7月16日

【柏林工业大学博士论文】可解释结构化机器学习:对相似性、图和Transformer模型的洞察，143页pdf

【柏林工业大学博士论文】可解释结构化机器学习:对相似性、图和Transformer模型的洞察，143页pdf

专知会员服务

46+阅读 · 2023年2月28日

【ICML2022】Transformer是元强化学习器

【ICML2022】Transformer是元强化学习器

专知会员服务

56+阅读 · 2022年6月15日

【NeurIPS 2021】流形上的注意力机制：规范等变的Transformer

【NeurIPS 2021】流形上的注意力机制：规范等变的Transformer

专知会员服务

30+阅读 · 2021年12月2日

最新《注意力机制与深度学习结合》综述论文

最新《注意力机制与深度学习结合》综述论文

专知会员服务

76+阅读 · 2021年6月17日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《系统簇式多域作战规划范畴论框架》

《美国防部指令6130.03，第1卷服役医疗标准：任命、征募或征召》

《COOL模型（行动循环圈）：军事领导体系中的战役层级变革流程》

《美国防部指令6130.03，第2卷服役医疗标准：保留》

相关资讯

【干货书】《Transformers 机器学习:深度探究》，284页pdf

【干货书】《Transformers 机器学习:深度探究》，284页pdf

专知

72+阅读 · 2022年4月21日

「因果推理」概述论文，13页pdf

「因果推理」概述论文，13页pdf

专知

16+阅读 · 2021年3月20日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知

14+阅读 · 2020年8月30日

深度学习的下一步：Transformer和注意力机制

深度学习的下一步：Transformer和注意力机制

云头条

56+阅读 · 2019年9月14日

谷歌NIPS论文Transformer模型解读：只要Attention就够了

谷歌NIPS论文Transformer模型解读：只要Attention就够了

AI100

14+阅读 · 2019年9月9日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

54+阅读 · 2019年4月12日

Attention！注意力机制模型最新综述

Attention！注意力机制模型最新综述

中国人工智能学会

18+阅读 · 2019年4月8日

让智能体主动交互，DeepMind提出用元强化学习实现因果推理

让智能体主动交互，DeepMind提出用元强化学习实现因果推理

机器之心

17+阅读 · 2019年2月11日

从Seq2seq到Attention模型到Self Attention（一）

从Seq2seq到Attention模型到Self Attention（一）

量化投资与机器学习

76+阅读 · 2018年10月8日

深度学习中的注意力机制

深度学习中的注意力机制

CSDN大数据

24+阅读 · 2017年11月2日

相关论文

Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning

Arxiv

0+阅读 · 2月11日

Momentum Attention: The Physics of In-Context Learning and Spectral Forensics for Mechanistic Interpretability

Arxiv

0+阅读 · 2月7日

Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures

Arxiv

0+阅读 · 2月7日

Echo State Transformer: Attention Over Finite Memories

Arxiv

0+阅读 · 2月6日

Attention Retention for Continual Learning with Vision Transformers

Arxiv

0+阅读 · 2月5日

Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting

Arxiv

0+阅读 · 2月5日

From independent patches to coordinated attention: Controlling information flow in vision transformers

Arxiv

0+阅读 · 2月4日

Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning

Arxiv

0+阅读 · 2月2日

Calibration Attention: Learning Reliability-Aware Representations for Vision Transformers

Arxiv

0+阅读 · 1月19日

A Survey on Transformers in Reinforcement Learning

Arxiv

31+阅读 · 2023年1月8日

相关基金

基于深度学习的复杂场景下人体行为识别研究

国家自然科学基金

9+阅读 · 2015年12月31日

基于图的半监督学习算法研究

国家自然科学基金

5+阅读 · 2015年12月31日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

面向推荐系统中异构隐式反馈建模的迁移学习技术研究

国家自然科学基金

5+阅读 · 2015年12月31日

基于深度表达和迁移学习的人体检测研究

国家自然科学基金

6+阅读 · 2015年12月31日

视知觉学习中的脑功能网络变化及其与学习效果的关系

国家自然科学基金

0+阅读 · 2015年12月31日

基于记忆的不变图像特征学习方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

深度学习框架下基于情境线索的视觉注意研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于深度学习的特征融合在移动机器人视觉中的场景理解及研究

国家自然科学基金

12+阅读 · 2014年12月31日

基于逆向强化学习和人工智能的移动机器人自主学习方法研究

国家自然科学基金

12+阅读 · 2013年12月31日

微信扫码咨询专知VIP会员