Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.
翻译:神经网络(NN)的重要性与复杂性日益增长。神经网络的性能(及能效)可能受限于计算资源或内存资源。处理-内存(PIM)范式将计算单元置于内存阵列附近或内部,成为加速内存受限型神经网络的有效解决方案。然而,PIM架构形态各异,不同PIM方法导致不同的性能权衡。本文旨在分析、讨论并对比基于DRAM的PIM架构对神经网络性能与能效的影响。为此,我们研究了三种前沿PIM架构:(1)UPMEM——将处理器与DRAM阵列集成于单个2D芯片;(2)Mensa——面向边缘设备的3D堆叠PIM架构;(3)SIMDRAM——利用DRAM模拟原理执行按位串行运算。分析表明:PIM能显著提升内存受限型神经网络的性能——(1)在GPU需进行内存超额订阅的通用矩阵-向量乘核运算中,UPMEM的性能达高端GPU的23倍;(2)在24个Google边缘神经网络模型上,Mensa的能效与吞吐量分别较Google Edge TPU提升3.0倍与3.1倍;(3)在三个二元神经网络上,SIMDRAM的处理性能较CPU/GPU分别提升16.7倍/1.4倍。我们得出结论:受固有架构设计选择影响,神经网络模型的最优PIM架构取决于模型的具体属性特征。