Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication. However, hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. While additional crossbars can be used to address these failures, they come with storage overhead and are not efficient in terms of space, energy, and cost. In this paper, we propose a fault protection mechanism that incurs zero space cost. Our approach includes: 1) differentiable structure pruning of rows and columns to reduce model redundancy, 2) weight duplication and voting for robust output, and 3) embedding duplicated most significant bits (MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE benchmark with the BERT model, and experimental results prove its effectiveness.
翻译:电阻式随机存取存储器(ReRAM)因其支持并行原位矩阵-向量乘法,已成为深度神经网络(DNNs)极具前景的硬件平台。然而,硬件故障(如固定故障缺陷)可能在内核推理过程中导致显著的预测误差。虽然可采用额外交叉阵列来应对此类故障,但这会带来存储开销,且在空间、能耗与成本方面效率低下。本文提出一种零空间代价的故障保护机制,其方法包括:1)采用可微结构剪枝技术削减行列冗余以降低模型冗余;2)通过权重复制与投票机制实现鲁棒输出;3)将复制后的最高有效位(MSBs)嵌入模型权重。我们在GLUE基准测试的九个任务上基于BERT模型评估了该方法,实验结果验证了其有效性。