Non-volatile memory (NVM) crossbars have been identified as a promising technology, for accelerating important machine learning operations, with matrix-vector multiplication being a key example. Binary neural networks (BNNs) are especially well-suited for use with NVM crossbars due to their use of a low-bitwidth representation for both activations and weights. However, the aggressive quantization of BNNs can result in suboptimal accuracy, and the analog effects of NVM crossbars can further degrade the accuracy during inference. This paper presents a comprehensive study that benchmarks BNNs trained and validated on ImageNet and deployed on NeuroSim, a simulator for NVM-crossbar-based PIM architecture. Our study analyzes the impact of various parameters, such as input precision and ADC resolution, on both the accuracy of the inference and the hardware performance metrics. We have found that an ADC resolution of 8-bit with an input precision of 4-bit achieves near-optimal accuracy compared to the original BNNs. In addition, we have identified bottleneck components in the PIM architecture that affect area, latency, and energy consumption, and we demonstrate the impact that different BNN layers have on hardware performance.
翻译:非易失存储器交叉阵列已被视为加速关键机器学习操作(以矩阵-向量乘法为典型代表)的一项有前景技术。二元神经网络因其对激活值和权重均采用低比特宽表示,特别适合与非易失存储器交叉阵结合使用。然而,二元神经网络的激进量化可能导致次优精度,而非易失存储器交叉阵列的模拟效应会在推理过程中进一步降低精度。本文呈现了一项综合性研究,在ImageNet数据集上训练并验证二元神经网络,将其部署到基于非易失存储器交叉阵列的处理器-内存架构模拟器NeuroSim上进行基准测试。本研究分析了输入精度和模数转换器分辨率等多种参数对推理精度及硬件性能指标的影响。我们发现,相较于原始二元神经网络,采用4比特输入精度配合8比特模数转换器分辨率可达到接近最优的精度。此外,我们识别出处理器-内存架构中影响面积、延迟和能耗的瓶颈组件,并展示了不同二元神经网络层对硬件性能的影响。