FADiff：面向张量加速器上深度神经网络调度的融合感知可微分优化 (FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators) - 专知论文

会员服务 ·

0

融合 · 映射 · 网络调度 · 调度 · DNN ·

2025 年 11 月 27 日

FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators

翻译：FADiff：面向张量加速器上深度神经网络调度的融合感知可微分优化

Shuao Jia,Zichao Ling,Chen Bai,Kang Zhao,Jianwang Zhai

from arxiv, 7 pages, 4 figures

Efficient deployment of Deep Neural Networks (DNNs), such as Large Language Models (LLMs), on tensor accelerators is essential for maximizing computational efficiency in modern AI systems. However, achieving this is challenging due to the enormous and complex design space created by the interaction of intra-layer mapping and inter-layer fusion. In this work, we present FADiff, a gradient-based optimization framework capable of automatically identifying high-quality intra-layer mapping and inter-layer fusion strategies to accelerate inference for DNN workloads. We first construct a unified and differentiable analytical cost model, which accurately predicts the energy and latency of both single-layer mappings and various layer fusion strategies. Then, by encoding discrete constraints into the loss function, we employ a gradient-based approach to efficiently explore the vast design space, determining the optimal joint strategy for mapping and fusion. Experimental results demonstrate the superiority of FADiff, achieving better optimization in terms of energy and latency compared to existing methods.

翻译：在张量加速器上高效部署深度神经网络（DNN），例如大语言模型（LLM），对于最大化现代人工智能系统的计算效率至关重要。然而，由于层内映射与层间融合相互交织所形成的庞大而复杂的设计空间，实现这一目标极具挑战性。在本工作中，我们提出了FADiff，一种基于梯度的优化框架，能够自动识别高质量的层内映射与层间融合策略，以加速DNN工作负载的推理。我们首先构建了一个统一且可微分的解析成本模型，该模型能够准确预测单层映射及多种层融合策略的能耗与延迟。随后，通过将离散约束编码至损失函数中，我们采用基于梯度的方法高效探索广阔的设计空间，从而确定映射与融合的最优联合策略。实验结果表明，FADiff相较于现有方法，在能耗与延迟方面实现了更优的优化效果。

0

相关内容

【CVPR2025】CarPlanner: 一种用于自动驾驶大规模强化学习的一致性自回归轨迹规划

【CVPR2025】CarPlanner: 一种用于自动驾驶大规模强化学习的一致性自回归轨迹规划

专知会员服务

14+阅读 · 2025年3月2日

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

专知会员服务

13+阅读 · 2024年10月16日

【ICML2024】TIMEX++: 通过信息瓶颈学习时间序列解释

【ICML2024】TIMEX++: 通过信息瓶颈学习时间序列解释

专知会员服务

17+阅读 · 2024年5月16日

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

专知会员服务

17+阅读 · 2024年3月4日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

[CVPR 2020]BEDSR-Net：单张文档图像的阴影去除深度网络

[CVPR 2020]BEDSR-Net：单张文档图像的阴影去除深度网络

专知

12+阅读 · 2020年9月30日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

专知

12+阅读 · 2020年6月9日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

使用CNN生成图像先验实现场景的盲图像去模糊

使用CNN生成图像先验实现场景的盲图像去模糊

统计学习与视觉计算组

10+阅读 · 2018年6月14日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

基于各向异性点光源的近场光度学三维重建问题研究

国家自然科学基金

2+阅读 · 2017年12月31日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于散射点密度信息熵的层析SAR建筑三维重建新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

“模块化自组装”DNA计算模型的研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

175+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

109+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

87+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

Data-centric Artificial Intelligence: A Survey

Arxiv

27+阅读 · 2023年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2025】CarPlanner: 一种用于自动驾驶大规模强化学习的一致性自回归轨迹规划

【CVPR2025】CarPlanner: 一种用于自动驾驶大规模强化学习的一致性自回归轨迹规划

专知会员服务

14+阅读 · 2025年3月2日

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

【NeurIPS2024】MoTE：在视觉语言到视频知识转移中协调泛化与专门化

专知会员服务

13+阅读 · 2024年10月16日

【ICML2024】TIMEX++: 通过信息瓶颈学习时间序列解释

【ICML2024】TIMEX++: 通过信息瓶颈学习时间序列解释

专知会员服务

17+阅读 · 2024年5月16日

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

专知会员服务

17+阅读 · 2024年3月4日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机与战争：被忽视的环境影响及无人机保护潜力》

俄罗斯规划未来无人机驱动军队

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

《人工智能、武器与影响力：前沿模型在模拟核危机中展现复杂推理》2026最新46页报告

相关资讯

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

[CVPR 2020]BEDSR-Net：单张文档图像的阴影去除深度网络

[CVPR 2020]BEDSR-Net：单张文档图像的阴影去除深度网络

专知

12+阅读 · 2020年9月30日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

专知

12+阅读 · 2020年6月9日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

使用CNN生成图像先验实现场景的盲图像去模糊

使用CNN生成图像先验实现场景的盲图像去模糊

统计学习与视觉计算组

10+阅读 · 2018年6月14日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

相关论文

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Is ChatGPT a Good Recommender? A Preliminary Study

Arxiv

175+阅读 · 2023年4月20日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

109+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

87+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Arxiv

51+阅读 · 2023年3月22日

Data-centric Artificial Intelligence: A Survey

Arxiv

27+阅读 · 2023年3月17日

相关基金

基于各向异性点光源的近场光度学三维重建问题研究

国家自然科学基金

2+阅读 · 2017年12月31日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于散射点密度信息熵的层析SAR建筑三维重建新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

“模块化自组装”DNA计算模型的研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

面向时空变化的GIS数据模型

国家自然科学基金

6+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员