The exploration of Processing-In-Memory (PIM) accelerators has garnered significant attention within the research community. However, the utilization of large-scale neural networks on Processing-In-Memory (PIM) accelerators encounters challenges due to constrained on-chip memory capacity. To tackle this issue, current works explore model compression algorithms to reduce the size of Convolutional Neural Networks (CNNs). Most of these algorithms either aim to represent neural operators with reduced-size parameters (e.g., quantization) or search for the best combinations of neural operators (e.g., neural architecture search). Designing neural operators to align with PIM accelerators' specifications is an area that warrants further study. In this paper, we introduce the Epitome, a lightweight neural operator offering convolution-like functionality, to craft memory-efficient CNN operators for PIM accelerators (EPIM). On the software side, we evaluate epitomes' latency and energy on PIM accelerators and introduce a PIM-aware layer-wise design method to enhance their hardware efficiency. We apply epitome-aware quantization to further reduce the size of epitomes. On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost. Experimental results reveal that our 3-bit quantized EPIM-ResNet50 attains 71.59% top-1 accuracy on ImageNet, reducing crossbar areas by 30.65 times. EPIM surpasses the state-of-the-art pruning methods on PIM.
翻译:存内处理(PIM)加速器的探索已在研究界引起广泛关注。然而,由于片上存储容量受限,大规模神经网络在存内处理加速器上的应用仍面临挑战。为解决该问题,现有工作探索模型压缩算法以减小卷积神经网络(CNN)的规模。这些算法大多旨在以减小参数尺寸的方式表示神经算子(如量化),或搜索神经算子的最优组合(如神经架构搜索)。设计适配PIM加速器规范的神经算子是一个值得进一步研究的领域。本文提出一种轻量级神经算子——Epitome,其具备卷积类功能,用于为PIM加速器构建内存高效的CNN算子(EPIM)。在软件方面,我们评估了Epitome在PIM加速器上的延迟与能耗,并引入一种PIM感知的逐层设计方法以提升其硬件效率。同时采用Epitome感知的量化进一步缩小Epitome尺寸。在硬件方面,我们修改现有PIM加速器的数据通路以适应Epitome,并实现特征图复用技术以降低计算成本。实验结果表明,我们3比特量化的EPIM-ResNet50在ImageNet上达到71.59%的Top-1准确率,交叉阵列面积减少30.65倍。EPIM在PIM上超越了当前最先进的剪枝方法。