Recasting Self-Attention with Holographic Reduced Representations - 专知论文

会员服务 ·

0

可约的 · 情景 · 变换 · Learning · 表示 ·

2023 年 5 月 31 日

Recasting Self-Attention with Holographic Reduced Representations

翻译：用全息简化表征重新诠释自注意力机制

Mohammad Mahmudul Alam,Edward Raff,Stella Biderman,Tim Oates,James Holt

from arxiv, To appear in Proceedings of the 40th International Conference on Machine Learning (ICML)

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the $\mathcal{O}(T^2)$ memory and $\mathcal{O}(T^2 H)$ compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of $T \geq 100,000$ are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a ``Hrrformer'' we obtain several benefits including $\mathcal{O}(T H \log H)$ time complexity, $\mathcal{O}(T H)$ space complexity, and convergence in $10\times$ fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to $280\times$ faster to train on the Long Range Arena benchmark. Code is available at \url{https://github.com/NeuromorphicComputationResearchProgram/Hrrformer}

翻译：近年来，自注意力机制已成为多个领域中序列建模的主导范式。然而，在序列长度极长的场景下，$\mathcal{O}(T^2)$的空间复杂度与$\mathcal{O}(T^2 H)$的计算复杂度使得Transformer难以实际应用。受恶意软件检测问题的驱动（其中$T \geq 100,000$的序列长度构成深度学习的障碍），我们采用神经符号方法——全息简化表征（HRR），对自注意力机制进行重构。通过该方法，我们实现了与标准自注意力相同的高层策略：一组查询与一组键进行匹配，并返回每个键对应值的加权响应。所实现的"Hrrformer"具有多项优势，包括$\mathcal{O}(T H \log H)$的时间复杂度、$\mathcal{O}(T H)$的空间复杂度，以及收敛速度提升10倍的训练效率。尽管如此，Hrrformer在LRA基准测试中仍能达到接近最优的准确率，且仅需单层即可完成学习。综合这些优势，我们的Hrrformer成为首个能够有效处理超长恶意软件分类序列的Transformer变体，在Long Range Arena基准测试上的训练速度最高提升280倍。代码已开源：\url{https://github.com/NeuromorphicComputationResearchProgram/Hrrformer}

0

相关内容

可约的

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

多模态MRI探索宫颈癌侵袭性及同步放化疗疗效评估的应用研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于信息熵和DCS的多基线SAR干涉理论与新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

目标跟踪中的时空上下文建模方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

大规模测量平差分布式计算技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于机器视觉的水处理絮凝过程中絮体检测与控制模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

气溶胶对夜间大气化学过程和NOx清除率影响研究

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Learned Thresholds Token Merging and Pruning for Vision Transformers

Arxiv

0+阅读 · 2023年7月20日

Representing Random Utility Choice Models with Neural Networks

Arxiv

0+阅读 · 2023年7月19日

EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting

Arxiv

0+阅读 · 2023年7月18日

Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Arxiv

0+阅读 · 2023年7月18日

Attention over pre-trained Sentence Embeddings for Long Document Classification

Arxiv

0+阅读 · 2023年7月18日

Retrieving Continuous Time Event Sequences using Neural Temporal Point Processes with Learnable Hashing

Arxiv

0+阅读 · 2023年7月13日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Dynamic Graph Representation Learning via Self-Attention Networks

Arxiv

52+阅读 · 2019年6月15日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

15+阅读 · 2018年6月26日

VIP会员

文章信息

相关主题

最新内容

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

0+阅读 · 今天15:20

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

0+阅读 · 今天15:18

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

8+阅读 · 今天5:53

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

4+阅读 · 今天5:45

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

2+阅读 · 今天5:23

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

2+阅读 · 今天5:11

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

6+阅读 · 今天5:04

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

5+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

9+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

10+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

5+阅读 · 7月25日

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

美空军新型反无人机部队初探

博士论文 | 面向大模型推理的内存高效算法

《无人系统互操作性导论——无人系统联合架构（JAUS）》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Learned Thresholds Token Merging and Pruning for Vision Transformers

Arxiv

0+阅读 · 2023年7月20日

Representing Random Utility Choice Models with Neural Networks

Arxiv

0+阅读 · 2023年7月19日

EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting

Arxiv

0+阅读 · 2023年7月18日

Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Arxiv

0+阅读 · 2023年7月18日

Attention over pre-trained Sentence Embeddings for Long Document Classification

Arxiv

0+阅读 · 2023年7月18日

Retrieving Continuous Time Event Sequences using Neural Temporal Point Processes with Learnable Hashing

Arxiv

0+阅读 · 2023年7月13日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Dynamic Graph Representation Learning via Self-Attention Networks

Arxiv

52+阅读 · 2019年6月15日

Hierarchical Graph Representation Learning with Differentiable Pooling

Hierarchical Graph Representation Learning with Differentiable Pooling

Arxiv

15+阅读 · 2018年6月26日

相关基金

多模态MRI探索宫颈癌侵袭性及同步放化疗疗效评估的应用研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于信息熵和DCS的多基线SAR干涉理论与新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

目标跟踪中的时空上下文建模方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

大规模测量平差分布式计算技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于机器视觉的水处理絮凝过程中絮体检测与控制模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

气溶胶对夜间大气化学过程和NOx清除率影响研究

国家自然科学基金

0+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员