In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and defects of emerging technologies used in advanced IMC can severely degrade the accuracy of inferred Neural Networks (NN) and lead to malfunctions in safety-critical applications. In this paper, we investigate an architectural-level mitigation technique based on the coordinated action of multiple checksum codes, to detect and correct errors at run-time. This implementation demonstrates higher efficiency in recovering accuracy across different AI algorithms and technologies compared to more traditional methods such as Triple Modular Redundancy (TMR). The results show that several configurations of our implementation recover more than 91% of the original accuracy with less than half of the area required by TMR and less than 40% of latency overhead.
翻译:内存计算(In-Memory Computing, IMC)为AI加速器引入了一种新型计算范式,在延迟和功耗方面具有高效率。然而,先进IMC中所用新兴技术的非理想特性和缺陷会严重降低推理神经网络(Neural Networks, NN)的精度,并导致安全关键型应用出现故障。本文研究了一种基于多重校验和码协同作用的架构级缓解技术,用于在运行时检测和校正错误。与三重模块冗余(Triple Modular Redundancy, TMR)等传统方法相比,该实现方案在不同AI算法和技术中展现出更高的精度恢复效率。结果表明,本实现的多种配置可恢复原始精度的91%以上,所需面积不足TMR的一半,且延迟开销低于40%。