Keyless Attention: Value-Space Routing and Value-Only Caching for Efficient Transformers - 专知论文

会员服务 ·

0

Attention · Projection · cache · GPT-2 · 变换 ·

Keyless Attention: Value-Space Routing and Value-Only Caching for Efficient Transformers

翻译：暂无翻译

from arxiv, 14 pages, 4 figures

We propose Keyless Attention, an attention mechanism that eliminates the key projection entirely, operating over queries and values only. This yields a Value-Only Cache that reduces KV cache memory and access overhead by exactly 50% over standard attention, while matching or exceeding standard attention's decode throughput. Beyond efficiency, we introduce Depth-$m$ Attention Factorization: standard attention computes a depth-2 factorization of the attention bilinear form, while Keyless Attention realizes a depth-$m$ instance of this family. At m=3, Keyless Attention matches the projection matrix count of standard attention via a value-space routing matrix that replaces the key projection and introduces a coupling between routing and retrieval. Experiments across five models and four architectures (GPT-2 280M, GPT-2 557M, Pythia 410M, Qwen2 1.5B, and Llama 3.2 1B) show that Keyless Attention matches or outperforms standard QKV attention on perplexity in 4 out of 5 models. On downstream zero-shot evaluation (GPT-2 557M), Keyless Attention outperforms on 4 out of 5 commonsense reasoning benchmarks, while achieving 50% KV cache reduction throughout.

翻译：暂无翻译

0

相关内容

Attention

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

专知会员服务

43+阅读 · 2020年10月29日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

71+阅读 · 2020年1月17日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

赛尔笔记 | Attention！注意力机制可解释吗？

赛尔笔记 | Attention！注意力机制可解释吗？

哈工大SCIR

23+阅读 · 2019年9月27日

谷歌NIPS论文Transformer模型解读：只要Attention就够了

谷歌NIPS论文Transformer模型解读：只要Attention就够了

AI100

14+阅读 · 2019年9月9日

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

机器之心

15+阅读 · 2019年7月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

一种关键字提取新方法

一种关键字提取新方法

1号机器人网

21+阅读 · 2018年11月15日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

12+阅读 · 2017年11月30日

改进智能优化策略多机动目标跟踪方法研究

国家自然科学基金

20+阅读 · 2015年12月31日

基于被引科学知识突变的突破性创新动态识别及其形成机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

带变动指标集的非光滑半无限优化问题的最优性条件研究

国家自然科学基金

0+阅读 · 2015年12月31日

Filling问题的最优化原理及其求解方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

空间微装配中基于多维微力及力矩的主动柔顺控制关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

提高移动最小二乘近似无网格方法计算效率的技术和理论

国家自然科学基金

0+阅读 · 2014年12月31日

无穷维随机微分系统的适定性与渐近动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

近空间飞行器的关键基础科学问题指导专家组调研和学术交流经费

国家自然科学基金

0+阅读 · 2014年12月31日

面向现代防御系统的多无人机协同优化与决策

国家自然科学基金

17+阅读 · 2012年12月31日

无人机对地目标跟踪与定位的基础理论与关键技术

国家自然科学基金

19+阅读 · 2011年12月31日

Design-Based Inference under Random Potential Outcomes

Arxiv

0+阅读 · 6月19日

Comparing Transformers and Hybrid Models at the Token Level

Arxiv

0+阅读 · 6月18日

Token-Operations-Oriented Inference Optimization Techniques for Large Models

Arxiv

0+阅读 · 6月18日

Efficiently Representing Algorithms With Chain-of-Thought Transformers

Arxiv

0+阅读 · 6月18日

Beyond Importance: Interchange-Sobol Sensitivity Reveals Task-Specific Content Channels in Transformer Components

Arxiv

0+阅读 · 6月12日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Attention, please! A survey of Neural Attention Models in Deep Learning

Arxiv

59+阅读 · 2021年3月31日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

1+阅读 · 今天14:45

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

1+阅读 · 今天14:43

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

3+阅读 · 今天14:31

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 今天14:20

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

2+阅读 · 今天14:11

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

3+阅读 · 今天14:07

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

3+阅读 · 今天14:03

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

2+阅读 · 今天13:59

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

7+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

相关VIP内容

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

替换Transformer！谷歌提出 Performer 模型，全面提升注意力机制！

专知会员服务

43+阅读 · 2020年10月29日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

71+阅读 · 2020年1月17日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

赛尔笔记 | Attention！注意力机制可解释吗？

赛尔笔记 | Attention！注意力机制可解释吗？

哈工大SCIR

23+阅读 · 2019年9月27日

谷歌NIPS论文Transformer模型解读：只要Attention就够了

谷歌NIPS论文Transformer模型解读：只要Attention就够了

AI100

14+阅读 · 2019年9月9日

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

参数少一半，效果还更好，天津大学和微软提出Transformer压缩模型

机器之心

15+阅读 · 2019年7月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

一种关键字提取新方法

一种关键字提取新方法

1号机器人网

21+阅读 · 2018年11月15日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

12+阅读 · 2017年11月30日

相关论文

Design-Based Inference under Random Potential Outcomes

Arxiv

0+阅读 · 6月19日

Comparing Transformers and Hybrid Models at the Token Level

Arxiv

0+阅读 · 6月18日

Token-Operations-Oriented Inference Optimization Techniques for Large Models

Arxiv

0+阅读 · 6月18日

Efficiently Representing Algorithms With Chain-of-Thought Transformers

Arxiv

0+阅读 · 6月18日

Beyond Importance: Interchange-Sobol Sensitivity Reveals Task-Specific Content Channels in Transformer Components

Arxiv

0+阅读 · 6月12日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Attention, please! A survey of Neural Attention Models in Deep Learning

Arxiv

59+阅读 · 2021年3月31日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

改进智能优化策略多机动目标跟踪方法研究

国家自然科学基金

20+阅读 · 2015年12月31日

基于被引科学知识突变的突破性创新动态识别及其形成机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

带变动指标集的非光滑半无限优化问题的最优性条件研究

国家自然科学基金

0+阅读 · 2015年12月31日

Filling问题的最优化原理及其求解方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

空间微装配中基于多维微力及力矩的主动柔顺控制关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

提高移动最小二乘近似无网格方法计算效率的技术和理论

国家自然科学基金

0+阅读 · 2014年12月31日

无穷维随机微分系统的适定性与渐近动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

近空间飞行器的关键基础科学问题指导专家组调研和学术交流经费

国家自然科学基金

0+阅读 · 2014年12月31日

面向现代防御系统的多无人机协同优化与决策

国家自然科学基金

17+阅读 · 2012年12月31日

无人机对地目标跟踪与定位的基础理论与关键技术

国家自然科学基金

19+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员