As the economic and environmental costs of training and deploying large vision or language models increase dramatically, analog in-memory computing (AIMC) emerges as a promising energy-efficient solution. However, the training perspective, especially its training dynamic, is underexplored. In AIMC hardware, the trainable weights are represented by the conductance of resistive elements and updated using consecutive electrical pulses. While the conductance changes by a constant in response to each pulse, in reality, the change is scaled by asymmetric and non-linear response functions, leading to a non-ideal training dynamic. This paper provides a theoretical foundation for gradient-based training on AIMC hardware with non-ideal response functions. We demonstrate that asymmetric response functions negatively impact Analog SGD by imposing an implicit penalty on the objective. To overcome the issue, we propose Residual Learning algorithm, which provably converges exactly to a critical point by solving a bilevel optimization problem. We demonstrate that the proposed method can be extended to address other hardware imperfections, such as limited response granularity. As we know, it is the first paper to investigate the impact of a class of generic non-ideal response functions. The conclusion is supported by simulations validating our theoretical insights.
翻译:随着训练和部署大型视觉或语言模型的经济与环境成本急剧增加,模拟内存内计算(AIMC)作为一种有前景的节能解决方案应运而生。然而,其训练视角,特别是训练动态,尚未得到充分探索。在AIMC硬件中,可训练权重由电阻元件的电导表示,并通过连续的电脉冲进行更新。虽然理论上每个脉冲会使电导产生恒定变化,但实际中,这种变化会受到非对称和非线性响应函数的缩放,从而导致非理想的训练动态。本文为具有非理想响应函数的AIMC硬件上的基于梯度的训练提供了理论基础。我们证明,非对称响应函数会通过向目标函数施加隐式惩罚,对模拟随机梯度下降(Analog SGD)产生负面影响。为克服此问题,我们提出了残差学习算法,该算法通过求解一个双层优化问题,可证明精确收敛到一个临界点。我们还展示了所提方法可扩展至解决其他硬件缺陷,如有限的响应粒度。据我们所知,这是首篇研究一类通用非理想响应函数影响的论文。结论得到了验证我们理论见解的仿真实验的支持。