Traditional CPU, GPU, and NPU architectures are increasingly limited by the von Neumann bottleneck. While In-Memory Computing (IMC) using ReRAM crossbar arrays offers a high-density, energy-efficient alternative, its practical deployment is constrained through their non-idealities. Existing hardware-aware training frameworks often require training from scratch, which is computationally prohibitive for modern large-scale models. In this work, we propose a finetuning-based hardware-aware training algorithm that enables robust DNN deployment on ReRAM with minimal training overhead. Our approach mitigates I-V non-linearity by applying a range-shrunk sinh transformation and incorporates retention errors directly into a regularization loss during the finetuning process. We evaluate our framework across models and tasks such as image classification and question-answering (QA). Experimental results demonstrate that our method achieves similar accuracy on large-scale models like ResNet18 and DeiT-Tiny as the base model. In-case of ImageNet for MobileNetV3 families the technique has only less than 2% accuracy degradation. Further, applying the technique on the SQuAD v2 dataset results in only 1 point degradation of F-1 score.
翻译:传统CPU、GPU及NPU架构日益受到冯·诺依曼瓶颈的制约。尽管基于ReRAM交叉阵列的内存计算(IMC)提供了一种高密度、高能效的替代方案,但其实际部署受限于器件非理想特性。现有硬件感知训练框架通常需要从零开始训练,这对现代大规模模型而言计算代价过高。本文提出一种基于微调的硬件感知训练算法,能以极小的训练开销实现鲁棒的DNN在ReRAM上的部署。该方法通过应用范围收缩的sinh变换以缓解I-V非线性,并在微调过程中将保持误差直接纳入正则化损失函数。我们针对图像分类与问答(QA)等任务及模型评估了所提框架。实验结果表明,该方法在ResNet18与DeiT-Tiny等大规模模型上能达到与基准模型相近的准确率。在ImageNet数据集上,对于MobileNetV3系列模型,该技术的准确率下降不超过2%。进一步在SQuAD v2数据集上应用该技术仅导致F-1分数1个点的下降。