Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design - 专知论文

会员服务 ·

0

优化器 · FAST · 可约的 · 设计 · Tensor ·

Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design

翻译：暂无翻译

Tanner Andrulis,Michael Gilbert,Vivienne Sze,Joel S. Emer

A low-latency and energy-efficient tensor algebra accelerator design must optimize how data movement and operations are scheduled (i.e., mapped) in the accelerator architecture. A key mapping optimization is fusion, meaning holding data on-chip between computation steps in the workload, which has been shown to reduce energy and latency by reducing expensive off-chip data movement. However, the optimal fusion choice depends on the workload and workload shape, and a mapper, which searches for the optimal mapping, can improve energy and latency significantly. However, prior mappers cannot find optimal mappings with fusion (i.e., fused mappings) in a feasible runtime because the number of fused mappings to search increases exponentially with the number of computation steps in the workload. In this paper, we introduce the Fast and Fusiest Mapper (FFM), a mapper to quickly find optimal mappings in a comprehensive fused mapspace for tensor algebra workloads. FFM shrinks the search space by pruning subsets of mappings (i.e., partial mappings) that are shown to never be a part of optimal mappings, quickly eliminating all suboptimal mappings containing those partial mappings. Then FFM joins partial mappings to construct optimal fused mappings. Using FFM, we demonstrate an energy-delay-product (EDP) reduction by up to $1.8\times$ compared to TransFusion, a state-of-the-art accelerator with hand-optimized fusion. Moreover, we show that FFM finds mappings orders of magnitude faster ($>10,000\times$) than prior automated mappers TileFlow and SET, and given the same runtime, reduces EDP by $>2\times$.

翻译：暂无翻译

0

相关内容

优化器

《在军用边缘设备上部署生成式人工智能：攻克性能、安全与效能挑战》最新30页slides

《在军用边缘设备上部署生成式人工智能：攻克性能、安全与效能挑战》最新30页slides

专知会员服务

33+阅读 · 2025年11月19日

《用计算图变换加速实际工程设计优化》MIT 400页

《用计算图变换加速实际工程设计优化》MIT 400页

专知会员服务

17+阅读 · 2025年11月7日

战术边缘计算：实现更快速、更智能军事决策的关键

战术边缘计算：实现更快速、更智能军事决策的关键

专知会员服务

20+阅读 · 2025年9月20日

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

专知会员服务

26+阅读 · 2025年8月24日

中文版 | 战术边缘计算：实现更快、更智能军事决策的关键

中文版 | 战术边缘计算：实现更快、更智能军事决策的关键

专知会员服务

33+阅读 · 2025年4月26日

处理器芯片敏捷设计方法：问题与挑战

专知会员服务

19+阅读 · 2021年6月29日

【Google】利用AUTOML实现加速感知神经网络设计

【Google】利用AUTOML实现加速感知神经网络设计

专知会员服务

30+阅读 · 2020年3月5日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

《“边缘计算+”技术白皮书》，82页pdf

《“边缘计算+”技术白皮书》，82页pdf

专知

11+阅读 · 2022年8月28日

【伯克利马毅老师等重磅新书】低维模型进行高维数据分析:原理、计算和应用，710页pdf

【伯克利马毅老师等重磅新书】低维模型进行高维数据分析:原理、计算和应用，710页pdf

专知

45+阅读 · 2020年12月9日

英伟达Faster Transformer：作者带你揭秘BERT优化

英伟达Faster Transformer：作者带你揭秘BERT优化

机器之心

14+阅读 · 2019年9月18日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

【泡泡图灵智库】使用语义特征优化全景影像序列与移动激光点云的自动配准

【泡泡图灵智库】使用语义特征优化全景影像序列与移动激光点云的自动配准

泡泡机器人SLAM

10+阅读 · 2019年9月15日

【边缘智能】边缘计算驱动的深度学习加速技术

【边缘智能】边缘计算驱动的深度学习加速技术

产业智能官

20+阅读 · 2019年2月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

如何用张量分解加速深层神经网络？（附代码）

如何用张量分解加速深层神经网络？（附代码）

AI研习社

11+阅读 · 2018年3月2日

IBM新论文|SamplePairing：针对图像处理领域的高效数据增强方式

IBM新论文|SamplePairing：针对图像处理领域的高效数据增强方式

极市平台

16+阅读 · 2018年1月20日

【下载】最新TensorFlow专业深度学习实战书籍和代码《Pro Deep Learning with TensorFlow》

【下载】最新TensorFlow专业深度学习实战书籍和代码《Pro Deep Learning with TensorFlow》

专知

37+阅读 · 2017年12月16日

支持新产品快速设计的复杂产品系统功能模块化方法

国家自然科学基金

1+阅读 · 2015年12月31日

临近空间高超声速飞行器低复杂度再入姿态控制器设计研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向估计性能优化的网络化控制系统传感器调度

国家自然科学基金

0+阅读 · 2015年12月31日

基于MEMS工艺的高性能石英加速度计谐振结构特征研究

国家自然科学基金

1+阅读 · 2015年12月31日

复杂工程产品基于多可信度近似的设计优化研究

国家自然科学基金

0+阅读 · 2015年12月31日

大功率柔顺驱动器的设计方法及能量优化和交互安全机理研究

国家自然科学基金

1+阅读 · 2015年12月31日

超高速CMOS数模转换器关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向大数据的高时效并行计算机系统结构与技术

国家自然科学基金

0+阅读 · 2014年12月31日

不确定结构可靠寿命设计的时变高精度模型和序列优化问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

压电智能作动器的高保真完整非线性动力学建模和高精度多通道运动协同同步控制系统一体化优化设计

国家自然科学基金

0+阅读 · 2014年12月31日

The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design

Arxiv

0+阅读 · 5月2日

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture

Arxiv

0+阅读 · 4月28日

Toward designing workload-aware Surface Code Architectures

Arxiv

0+阅读 · 4月21日

Design Rules for Extreme-Edge Scientific Computing on AI Engines

Arxiv

0+阅读 · 4月21日

DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance

Arxiv

0+阅读 · 4月16日

A multiphysics model for triboelectric nanogenerator design with explicit surface roughness representation

Arxiv

0+阅读 · 4月1日

GPU-Accelerated Optimization of Transformer-Based Neural Networks for Real-Time Inference

Arxiv

0+阅读 · 3月30日

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Arxiv

0+阅读 · 3月27日

Ultra-fast Traffic Nowcasting and Control via Differentiable Agent-based Simulation

Arxiv

0+阅读 · 3月26日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

7+阅读 · 5月5日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

4+阅读 · 5月5日

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

4+阅读 · 5月5日

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

5+阅读 · 5月5日

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

7+阅读 · 5月5日

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

13+阅读 · 5月5日

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

5+阅读 · 5月5日

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

7+阅读 · 5月5日

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

3+阅读 · 5月5日

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

3+阅读 · 5月5日

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

11+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

9+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

9+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

7+阅读 · 5月4日

相关VIP内容

《在军用边缘设备上部署生成式人工智能：攻克性能、安全与效能挑战》最新30页slides

《在军用边缘设备上部署生成式人工智能：攻克性能、安全与效能挑战》最新30页slides

专知会员服务

33+阅读 · 2025年11月19日

《用计算图变换加速实际工程设计优化》MIT 400页

《用计算图变换加速实际工程设计优化》MIT 400页

专知会员服务

17+阅读 · 2025年11月7日

战术边缘计算：实现更快速、更智能军事决策的关键

战术边缘计算：实现更快速、更智能军事决策的关键

专知会员服务

20+阅读 · 2025年9月20日

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

专知会员服务

26+阅读 · 2025年8月24日

中文版 | 战术边缘计算：实现更快、更智能军事决策的关键

中文版 | 战术边缘计算：实现更快、更智能军事决策的关键

专知会员服务

33+阅读 · 2025年4月26日

处理器芯片敏捷设计方法：问题与挑战

专知会员服务

19+阅读 · 2021年6月29日

【Google】利用AUTOML实现加速感知神经网络设计

【Google】利用AUTOML实现加速感知神经网络设计

专知会员服务

30+阅读 · 2020年3月5日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

《“边缘计算+”技术白皮书》，82页pdf

《“边缘计算+”技术白皮书》，82页pdf

专知

11+阅读 · 2022年8月28日

【伯克利马毅老师等重磅新书】低维模型进行高维数据分析:原理、计算和应用，710页pdf

【伯克利马毅老师等重磅新书】低维模型进行高维数据分析:原理、计算和应用，710页pdf

专知

45+阅读 · 2020年12月9日

英伟达Faster Transformer：作者带你揭秘BERT优化

英伟达Faster Transformer：作者带你揭秘BERT优化

机器之心

14+阅读 · 2019年9月18日

初学者系列：Attentional Factorization Machines（AFM）详解

初学者系列：Attentional Factorization Machines（AFM）详解

专知

82+阅读 · 2019年9月16日

【泡泡图灵智库】使用语义特征优化全景影像序列与移动激光点云的自动配准

【泡泡图灵智库】使用语义特征优化全景影像序列与移动激光点云的自动配准

泡泡机器人SLAM

10+阅读 · 2019年9月15日

【边缘智能】边缘计算驱动的深度学习加速技术

【边缘智能】边缘计算驱动的深度学习加速技术

产业智能官

20+阅读 · 2019年2月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

如何用张量分解加速深层神经网络？（附代码）

如何用张量分解加速深层神经网络？（附代码）

AI研习社

11+阅读 · 2018年3月2日

IBM新论文|SamplePairing：针对图像处理领域的高效数据增强方式

IBM新论文|SamplePairing：针对图像处理领域的高效数据增强方式

极市平台

16+阅读 · 2018年1月20日

【下载】最新TensorFlow专业深度学习实战书籍和代码《Pro Deep Learning with TensorFlow》

【下载】最新TensorFlow专业深度学习实战书籍和代码《Pro Deep Learning with TensorFlow》

专知

37+阅读 · 2017年12月16日

相关论文

The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design

Arxiv

0+阅读 · 5月2日

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture

Arxiv

0+阅读 · 4月28日

Toward designing workload-aware Surface Code Architectures

Arxiv

0+阅读 · 4月21日

Design Rules for Extreme-Edge Scientific Computing on AI Engines

Arxiv

0+阅读 · 4月21日

DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance

Arxiv

0+阅读 · 4月16日

A multiphysics model for triboelectric nanogenerator design with explicit surface roughness representation

Arxiv

0+阅读 · 4月1日

GPU-Accelerated Optimization of Transformer-Based Neural Networks for Real-Time Inference

Arxiv

0+阅读 · 3月30日

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Arxiv

0+阅读 · 3月27日

Ultra-fast Traffic Nowcasting and Control via Differentiable Agent-based Simulation

Arxiv

0+阅读 · 3月26日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

相关基金

支持新产品快速设计的复杂产品系统功能模块化方法

国家自然科学基金

1+阅读 · 2015年12月31日

临近空间高超声速飞行器低复杂度再入姿态控制器设计研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向估计性能优化的网络化控制系统传感器调度

国家自然科学基金

0+阅读 · 2015年12月31日

基于MEMS工艺的高性能石英加速度计谐振结构特征研究

国家自然科学基金

1+阅读 · 2015年12月31日

复杂工程产品基于多可信度近似的设计优化研究

国家自然科学基金

0+阅读 · 2015年12月31日

大功率柔顺驱动器的设计方法及能量优化和交互安全机理研究

国家自然科学基金

1+阅读 · 2015年12月31日

超高速CMOS数模转换器关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向大数据的高时效并行计算机系统结构与技术

国家自然科学基金

0+阅读 · 2014年12月31日

不确定结构可靠寿命设计的时变高精度模型和序列优化问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

压电智能作动器的高保真完整非线性动力学建模和高精度多通道运动协同同步控制系统一体化优化设计

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员