In-memory computing with resistive crossbar arrays has been suggested to accelerate deep-learning workloads in highly efficient manner. To unleash the full potential of in-memory computing, it is desirable to accelerate the training as well as inference for large deep neural networks (DNNs). In the past, specialized in-memory training algorithms have been proposed that not only accelerate the forward and backward passes, but also establish tricks to update the weight in-memory and in parallel. However, the state-of-the-art algorithm (Tiki-Taka version 2 (TTv2)) still requires near perfect offset correction and suffers from potential biases that might occur due to programming and estimation inaccuracies, as well as longer-term instabilities of the device materials. Here we propose and describe two new and improved algorithms for in-memory computing (Chopped-TTv2 (c-TTv2) and Analog Gradient Accumulation with Dynamic reference (AGAD)), that retain the same runtime complexity but correct for any remaining offsets using choppers. These algorithms greatly relax the device requirements and thus expanding the scope of possible materials potentially employed for such fast in-memory DNN training.
翻译:基于阻变交叉阵列的存内计算已被提出用于以高效方式加速深度学习工作负载。为充分释放存内计算的潜力,有必要加速大型深度神经网络的训练与推理过程。过去,专门化的存内训练算法已被提出,这些算法不仅加速了前向和反向传播,还建立了在内存中并行更新权重的技巧。然而,最先进的算法(Tiki-Taka第二版(TTv2))仍需要近乎完美的偏置修正,且容易因编程与估计不精确以及器件材料的长期不稳定性而引入潜在偏差。本文提出并描述了两种新的改进型存内计算算法(斩波TTv2(c-TTv2)与动态参考模拟梯度累积(AGAD)),它们保持相同的运行时复杂度,但利用斩波器修正所有残余偏置。这些算法极大地放宽了器件要求,从而扩展了此类快速存内深度神经网络训练中潜在可用的材料范围。