Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud - 专知论文

会员服务 ·

0

能源效率 · 内存 · 边 · 神经网络 · GPU ·

2023 年 3 月 27 日

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

翻译：面向边缘到云端场景的神经网络推理加速：基于DRAM的处理架构

Geraldo F. Oliveira,Juan Gómez-Luna,Saugata Ghose,Amirali Boroumand,Onur Mutlu

from arxiv, This is an extended and updated version of a paper published in IEEE Micro, pp. 1-14, 29 Aug. 2022. arXiv admin note: text overlap with arXiv:2109.14320

Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.

翻译：神经网络（NN）的重要性与复杂性日益增长。神经网络的性能（及能效）可能受限于计算资源或内存资源。处理-内存（PIM）范式将计算单元置于内存阵列附近或内部，成为加速内存受限型神经网络的有效解决方案。然而，PIM架构形态各异，不同PIM方法导致不同的性能权衡。本文旨在分析、讨论并对比基于DRAM的PIM架构对神经网络性能与能效的影响。为此，我们研究了三种前沿PIM架构：（1）UPMEM——将处理器与DRAM阵列集成于单个2D芯片；（2）Mensa——面向边缘设备的3D堆叠PIM架构；（3）SIMDRAM——利用DRAM模拟原理执行按位串行运算。分析表明：PIM能显著提升内存受限型神经网络的性能——（1）在GPU需进行内存超额订阅的通用矩阵-向量乘核运算中，UPMEM的性能达高端GPU的23倍；（2）在24个Google边缘神经网络模型上，Mensa的能效与吞吐量分别较Google Edge TPU提升3.0倍与3.1倍；（3）在三个二元神经网络上，SIMDRAM的处理性能较CPU/GPU分别提升16.7倍/1.4倍。我们得出结论：受固有架构设计选择影响，神经网络模型的最优PIM架构取决于模型的具体属性特征。

0

相关内容

能源效率

深度学习如何用在边缘设备上？苏黎世联邦理工Zhongnan博士论文《在边缘设备上使用深度学习》，182页pdf

深度学习如何用在边缘设备上？苏黎世联邦理工Zhongnan博士论文《在边缘设备上使用深度学习》，182页pdf

专知会员服务

92+阅读 · 2022年10月22日

【伯克利博士论文】硬件感知的高效深度学习，154页pdf

【伯克利博士论文】硬件感知的高效深度学习，154页pdf

专知会员服务

75+阅读 · 2022年10月20日

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

专知会员服务

16+阅读 · 2022年3月17日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

108+阅读 · 2021年10月30日

【伯克利】高效神经网络推理的量化方法综述论文

专知会员服务

51+阅读 · 2021年6月28日

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

专知会员服务

63+阅读 · 2020年7月12日

【UCLA-微软-WWW2020】异构图Transformer，Heterogeneous Graph Transformer

【UCLA-微软-WWW2020】异构图Transformer，Heterogeneous Graph Transformer

专知会员服务

137+阅读 · 2020年3月8日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

【边缘智能】边缘计算驱动的深度学习加速技术

【边缘智能】边缘计算驱动的深度学习加速技术

产业智能官

20+阅读 · 2019年2月8日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

专知

18+阅读 · 2018年7月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

面向大规模动态异构网络的支持多用户并发任务的物联网应用构建方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

高能效FPGA高层次综合研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于纠删码的大规模存储集群重构优化技术

国家自然科学基金

0+阅读 · 2013年12月31日

海量数据处理中面向任务加速的数据调度策略研究

国家自然科学基金

2+阅读 · 2013年12月31日

FPGA嵌入式抗辐照容错处理器核及其系统设计实现研究

国家自然科学基金

1+阅读 · 2012年12月31日

新布局规划及三维集成电路高速互连规划算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于GPU性能模型的异构系统优化技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于移动网格的局部间断Galerkin有限元方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于软硬件混合监测机制的多核环境下内存系统性能分析与优化技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

Accelerating Communications in Federated Applications with Transparent Object Proxies

Arxiv

0+阅读 · 2023年5月16日

The Power of Linear Recurrent Neural Networks

Arxiv

0+阅读 · 2023年5月12日

Graph Neural Network for Accurate and Low-complexity SAR ATR

Arxiv

0+阅读 · 2023年5月11日

Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs

Arxiv

0+阅读 · 2023年5月8日

Enabling Deep Learning on Edge Devices

Arxiv

19+阅读 · 2022年10月6日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

Arxiv

12+阅读 · 2022年2月10日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Dynamic Graph Neural Networks

Arxiv

24+阅读 · 2018年10月24日

Bayesian Convolutional Neural Networks

Arxiv

19+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

最新内容

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

专知会员服务

0+阅读 · 4分钟前

乌克兰纵深打击如何重塑俄罗斯的战略选择

乌克兰纵深打击如何重塑俄罗斯的战略选择

专知会员服务

0+阅读 · 11分钟前

《分布式太空任务对比分析与综合建模及仿真环境》120页

《分布式太空任务对比分析与综合建模及仿真环境》120页

专知会员服务

0+阅读 · 22分钟前

俄乌战争中关于中程打击无人机部署的经验启示

俄乌战争中关于中程打击无人机部署的经验启示

专知会员服务

0+阅读 · 28分钟前

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

4+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

4+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

6+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

2+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

4+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

11+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

15+阅读 · 7月21日

相关VIP内容

深度学习如何用在边缘设备上？苏黎世联邦理工Zhongnan博士论文《在边缘设备上使用深度学习》，182页pdf

深度学习如何用在边缘设备上？苏黎世联邦理工Zhongnan博士论文《在边缘设备上使用深度学习》，182页pdf

专知会员服务

92+阅读 · 2022年10月22日

【伯克利博士论文】硬件感知的高效深度学习，154页pdf

【伯克利博士论文】硬件感知的高效深度学习，154页pdf

专知会员服务

75+阅读 · 2022年10月20日

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

专知会员服务

16+阅读 · 2022年3月17日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

108+阅读 · 2021年10月30日

【伯克利】高效神经网络推理的量化方法综述论文

专知会员服务

51+阅读 · 2021年6月28日

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

可解释高效异构图卷积网络，Interpretable and Efficient Heterogeneous Graph Convolutional Network

专知会员服务

63+阅读 · 2020年7月12日

【UCLA-微软-WWW2020】异构图Transformer，Heterogeneous Graph Transformer

【UCLA-微软-WWW2020】异构图Transformer，Heterogeneous Graph Transformer

专知会员服务

137+阅读 · 2020年3月8日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

俄乌战争中关于中程打击无人机部署的经验启示

《基于强化学习的自动化红队测试》

《分布式太空任务对比分析与综合建模及仿真环境》120页

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

相关资讯

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

【边缘智能】边缘计算驱动的深度学习加速技术

【边缘智能】边缘计算驱动的深度学习加速技术

产业智能官

20+阅读 · 2019年2月8日

硬件加速神经网络综述

硬件加速神经网络综述

计算机研究与发展

26+阅读 · 2019年2月1日

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

专知

18+阅读 · 2018年7月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Accelerating Communications in Federated Applications with Transparent Object Proxies

Arxiv

0+阅读 · 2023年5月16日

The Power of Linear Recurrent Neural Networks

Arxiv

0+阅读 · 2023年5月12日

Graph Neural Network for Accurate and Low-complexity SAR ATR

Arxiv

0+阅读 · 2023年5月11日

Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs

Arxiv

0+阅读 · 2023年5月8日

Enabling Deep Learning on Edge Devices

Arxiv

19+阅读 · 2022年10月6日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

Arxiv

12+阅读 · 2022年2月10日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Dynamic Graph Neural Networks

Arxiv

24+阅读 · 2018年10月24日

Bayesian Convolutional Neural Networks

Arxiv

19+阅读 · 2018年6月27日

相关基金

面向大规模动态异构网络的支持多用户并发任务的物联网应用构建方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

高能效FPGA高层次综合研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于纠删码的大规模存储集群重构优化技术

国家自然科学基金

0+阅读 · 2013年12月31日

海量数据处理中面向任务加速的数据调度策略研究

国家自然科学基金

2+阅读 · 2013年12月31日

FPGA嵌入式抗辐照容错处理器核及其系统设计实现研究

国家自然科学基金

1+阅读 · 2012年12月31日

新布局规划及三维集成电路高速互连规划算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于GPU性能模型的异构系统优化技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于移动网格的局部间断Galerkin有限元方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于软硬件混合监测机制的多核环境下内存系统性能分析与优化技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员