In memory computing (IMC) architectures for deep learning (DL) accelerators leverage energy-efficient and highly parallel matrix vector multiplication (MVM) operations, implemented directly in memory arrays. Such IMC designs have been explored based on CMOS as well as emerging non-volatile memory (NVM) technologies like RRAM. IMC architectures generally involve a large number of cores consisting of memory arrays, storing the trained weights of the DL model. Peripheral units like DACs and ADCs are also used for applying inputs and reading out the output values. Recently reported designs reveal that the ADCs required for reading out the MVM results, consume more than 85% of the total compute power and also dominate the area, thereby eschewing the benefits of the IMC scheme. Mitigation of imperfections in the ADCs, namely, non-linearity and variations, incur significant design overheads, due to dedicated calibration units. In this work we present peripheral aware design of IMC cores, to mitigate such overheads. It involves incorporating the non-idealities of ADCs in the training of the DL models, along with that of the memory units. The proposed approach applies equally well to both current mode as well as charge mode MVM operations demonstrated in recent years., and can significantly simplify the design of mixed-signal IMC units.
翻译:面向深度学习加速器的存内计算架构利用直接在存储器阵列中实现的能量高效且高度并行的矩阵向量乘法运算。此类存内计算设计已基于CMOS以及RRAM等新兴非易失性存储器技术展开探索。存内计算架构通常包含大量由存储器阵列构成的核心,用于存储深度学习模型的训练权重。同时采用DAC和ADC等外围单元施加输入信号并读取输出值。近期报道的设计表明,读取矩阵向量乘法结果所需的ADC消耗超过总计算功耗的85%且占据主导面积,从而削弱了存内计算方案的优势。由于需要专用校准单元,缓解ADC非线性与偏差等非理想特性会带来显著的设计开销。本研究提出存内计算核心的外围感知设计以降低此类开销。该方法将ADC的非理想特性与存储单元的非理想特性共同纳入深度学习模型的训练过程。所提方案对近年来展示的电流模式与电荷模式矩阵向量乘法运算均具有良好适用性,并能显著简化混合信号存内计算单元的设计。