Bulk-Switching Memristor-based Compute-In-Memory Module for Deep Neural Network Training

The need for deep neural network (DNN) models with higher performance and better functionality leads to the proliferation of very large models. Model training, however, requires intensive computation time and energy. Memristor-based compute-in-memory (CIM) modules can perform vector-matrix multiplication (VMM) in situ and in parallel, and have shown great promises in DNN inference applications. However, CIM-based model training faces challenges due to non-linear weight updates, device variations, and low-precision in analog computing circuits. In this work, we experimentally implement a mixed-precision training scheme to mitigate these effects using a bulk-switching memristor CIM module. Lowprecision CIM modules are used to accelerate the expensive VMM operations, with high precision weight updates accumulated in digital units. Memristor devices are only changed when the accumulated weight update value exceeds a pre-defined threshold. The proposed scheme is implemented with a system-on-chip (SoC) of fully integrated analog CIM modules and digital sub-systems, showing fast convergence of LeNet training to 97.73%. The efficacy of training larger models is evaluated using realistic hardware parameters and shows that that analog CIM modules can enable efficient mix-precision DNN training with accuracy comparable to full-precision software trained models. Additionally, models trained on chip are inherently robust to hardware variations, allowing direct mapping to CIM inference chips without additional re-training.

翻译：随着对高性能、高功能性深度神经网络（DNN）模型需求的增长，超大规模模型日益普及。然而，模型训练需要耗费巨大的计算时间和能量。基于忆阻器的存内计算（CIM）模块能够原位并行执行向量-矩阵乘法（VMM），在DNN推理应用中展现出巨大潜力。然而，由于非线性的权重更新、器件变异以及模拟计算电路的低精度问题，基于CIM的模型训练面临诸多挑战。本工作中，我们实验性地实现了一种混合精度训练方案，利用体切换忆阻器CIM模块缓解上述影响。低精度CIM模块用于加速昂贵的VMM操作，而高精度权重更新累积在数字单元中完成。仅当累积的权重更新值超过预设阈值时，忆阻器器件才被改变。该方案通过集成模拟CIM模块与数字子系统的片上系统（SoC）实现，LeNet训练收敛速度达到97.73%。我们利用实际硬件参数评估了该方案在较大规模模型训练中的有效性，结果表明，模拟CIM模块能够实现高效的混合精度DNN训练，其准确率可与全精度软件训练模型相媲美。此外，片上训练的模型对硬件变异具有内在鲁棒性，可直接映射至CIM推理芯片，无需额外重新训练。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日