Fast Cross-Operator Optimization of Attention Dataflow - 专知论文

会员服务 ·

0

优化器 · Attention · Dataflow · 最优化 · FAST ·

Fast Cross-Operator Optimization of Attention Dataflow

翻译：暂无翻译

Haodong Chang,Hailiang Hu,Zhenrui Wang,Yu Gong,Rongjian Liang,Zhexiang Tang,Bo Yuan,Jiang Hu

Attention is a fundamental computational kernel that accounts for the majority of the workload in transformer and LLM computing. Optimizing dataflow is crucial for enhancing both performance and energy efficiency in attention computation. This optimization involves a range of decisions, such as tiling, computation ordering and buffer management, and can be applied at both intra-operator and inter-operator levels, resulting in a highly complex decision space. We propose a new approach to cross-operator dataflow optimization. Its centerpiece is an analytical performance model that spans a large decision space and enables matrix-based encoding of multiple candidate solutions. Built on this foundation, a vast number of solutions can be evaluated rapidly, and with the aid of an effective pruning technique, the optimal solution can be identified through exhaustive enumeration. We refer to our method as MMEE (Matrix Multiplication Encoded Enumeration). The ability to efficiently enumerate a large design space allows MMEE to deliver higher-quality solutions at a substantially faster speed compared to prior approaches. The MMEE approach is evaluated across various test cases for different accelerator configurations. For energy-driven optimization, MMEE reduces energy consumption by 48%-50% and latency by 31%-69%, compared to state-of-the-art methods. For latency-driven optimization, MMEE achieves simultaneous reductions of 40%-50% in energy consumption and 40%-69% in latency, respectively. Additionally, MMEE is $64\times$ to $343\times$ faster than previous works.

翻译：暂无翻译

0

相关内容

优化器

金融业AI大模型智算网络研究报告

金融业AI大模型智算网络研究报告

专知会员服务

18+阅读 · 2025年6月1日

【Google】高效Transformer综述，Efficient Transformers: A Survey

【Google】高效Transformer综述，Efficient Transformers: A Survey

专知会员服务

66+阅读 · 2022年3月17日

清华大学提出ACmix | 这才是Self-Attention与CNN正确的融合范式，性能速度全面提升

清华大学提出ACmix | 这才是Self-Attention与CNN正确的融合范式，性能速度全面提升

专知会员服务

27+阅读 · 2021年12月3日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

108+阅读 · 2020年8月30日

TensorFlowLite:端侧机器学习框架

TensorFlowLite:端侧机器学习框架

专知会员服务

33+阅读 · 2020年8月27日

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

专知会员服务

60+阅读 · 2020年6月28日

【2020论文翻译】基于SARSA的深度强化学习的移动边缘计算任务分流和资源分配

【2020论文翻译】基于SARSA的深度强化学习的移动边缘计算任务分流和资源分配

专知会员服务

21+阅读 · 2020年5月20日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

71+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知

14+阅读 · 2020年8月30日

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

54+阅读 · 2019年4月12日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

【CVPR2018】如何增强Attention Model的推理能力

【CVPR2018】如何增强Attention Model的推理能力

专知

15+阅读 · 2018年7月2日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

12+阅读 · 2017年11月30日

TensorFlow seq2seq中的Attention机制（续）

TensorFlow seq2seq中的Attention机制（续）

深度学习每日摘要

15+阅读 · 2017年11月16日

Tensorflow卷积神经网络

Tensorflow卷积神经网络

全球人工智能

13+阅读 · 2017年10月14日

高动态方向性多跳自组网传输调度理论研究与实现

国家自然科学基金

1+阅读 · 2015年12月31日

基于大数据的运载火箭总装系统智能优化调度理论与方法

国家自然科学基金

0+阅读 · 2015年12月31日

云计算环境下面向大数据的在线聚集并行优化机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向有源配电网的数据传输优化及智能过滤机制

国家自然科学基金

0+阅读 · 2015年12月31日

面向云数据中心应用感知的参与式资源调度技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于日常移动平台的用户状态感知与软件协同技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

普适计算对象感知多模态不精确性数据融合算法研究

国家自然科学基金

5+阅读 · 2014年12月31日

面向大数据的高时效并行计算机系统结构与技术

国家自然科学基金

0+阅读 · 2014年12月31日

面向大数据计算的高吞吐量众核处理器关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向类脑计算存储器的调制编码理论及方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

Learning to Trust AI and Data-driven models in Data Assimilation through a Multifidelity Ensemble Gaussian Mixture Filter Framework

Arxiv

0+阅读 · 4月24日

Quantifying Data Similarity Using Cross Learning

Arxiv

0+阅读 · 4月21日

AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures

Arxiv

0+阅读 · 4月20日

Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis

Arxiv

0+阅读 · 4月10日

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Arxiv

0+阅读 · 4月7日

Fast and memory-efficient classical simulation of quantum machine learning via forward and backward gate fusion

Arxiv

0+阅读 · 4月3日

FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators

Arxiv

0+阅读 · 4月2日

How iteration order influences convergence and stability in deep learning

Arxiv

0+阅读 · 3月27日

GenOpticalFlow: A Generative Approach to Unsupervised Optical Flow Learning

Arxiv

0+阅读 · 3月23日

Attention, please! A survey of Neural Attention Models in Deep Learning

Arxiv

59+阅读 · 2021年3月31日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

7+阅读 · 5月5日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

3+阅读 · 5月5日

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

3+阅读 · 5月5日

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

4+阅读 · 5月5日

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

6+阅读 · 5月5日

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

12+阅读 · 5月5日

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

4+阅读 · 5月5日

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

6+阅读 · 5月5日

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

2+阅读 · 5月5日

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

2+阅读 · 5月5日

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

11+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

9+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

9+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

7+阅读 · 5月4日

相关VIP内容

金融业AI大模型智算网络研究报告

金融业AI大模型智算网络研究报告

专知会员服务

18+阅读 · 2025年6月1日

【Google】高效Transformer综述，Efficient Transformers: A Survey

【Google】高效Transformer综述，Efficient Transformers: A Survey

专知会员服务

66+阅读 · 2022年3月17日

清华大学提出ACmix | 这才是Self-Attention与CNN正确的融合范式，性能速度全面提升

清华大学提出ACmix | 这才是Self-Attention与CNN正确的融合范式，性能速度全面提升

专知会员服务

27+阅读 · 2021年12月3日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

108+阅读 · 2020年8月30日

TensorFlowLite:端侧机器学习框架

TensorFlowLite:端侧机器学习框架

专知会员服务

33+阅读 · 2020年8月27日

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

【AAAI 2019】双曲异构信息网络嵌入，Hyperbolic Heterogeneous Information Network Embedding

专知会员服务

60+阅读 · 2020年6月28日

【2020论文翻译】基于SARSA的深度强化学习的移动边缘计算任务分流和资源分配

【2020论文翻译】基于SARSA的深度强化学习的移动边缘计算任务分流和资源分配

专知会员服务

21+阅读 · 2020年5月20日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

71+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知

14+阅读 · 2020年8月30日

一文读懂Attention机制

一文读懂Attention机制

机器学习与推荐算法

63+阅读 · 2020年6月9日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

54+阅读 · 2019年4月12日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

【CVPR2018】如何增强Attention Model的推理能力

【CVPR2018】如何增强Attention Model的推理能力

专知

15+阅读 · 2018年7月2日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

12+阅读 · 2017年11月30日

TensorFlow seq2seq中的Attention机制（续）

TensorFlow seq2seq中的Attention机制（续）

深度学习每日摘要

15+阅读 · 2017年11月16日

Tensorflow卷积神经网络

Tensorflow卷积神经网络

全球人工智能

13+阅读 · 2017年10月14日

相关论文

Learning to Trust AI and Data-driven models in Data Assimilation through a Multifidelity Ensemble Gaussian Mixture Filter Framework

Arxiv

0+阅读 · 4月24日

Quantifying Data Similarity Using Cross Learning

Arxiv

0+阅读 · 4月21日

AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures

Arxiv

0+阅读 · 4月20日

Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis

Arxiv

0+阅读 · 4月10日

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Arxiv

0+阅读 · 4月7日

Fast and memory-efficient classical simulation of quantum machine learning via forward and backward gate fusion

Arxiv

0+阅读 · 4月3日

FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators

Arxiv

0+阅读 · 4月2日

How iteration order influences convergence and stability in deep learning

Arxiv

0+阅读 · 3月27日

GenOpticalFlow: A Generative Approach to Unsupervised Optical Flow Learning

Arxiv

0+阅读 · 3月23日

Attention, please! A survey of Neural Attention Models in Deep Learning

Arxiv

59+阅读 · 2021年3月31日

相关基金

高动态方向性多跳自组网传输调度理论研究与实现

国家自然科学基金

1+阅读 · 2015年12月31日

基于大数据的运载火箭总装系统智能优化调度理论与方法

国家自然科学基金

0+阅读 · 2015年12月31日

云计算环境下面向大数据的在线聚集并行优化机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向有源配电网的数据传输优化及智能过滤机制

国家自然科学基金

0+阅读 · 2015年12月31日

面向云数据中心应用感知的参与式资源调度技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于日常移动平台的用户状态感知与软件协同技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

普适计算对象感知多模态不精确性数据融合算法研究

国家自然科学基金

5+阅读 · 2014年12月31日

面向大数据的高时效并行计算机系统结构与技术

国家自然科学基金

0+阅读 · 2014年12月31日

面向大数据计算的高吞吐量众核处理器关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向类脑计算存储器的调制编码理论及方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员