Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significant performance, area, and memory footprint improvement. While promising, the mixed-precision computation on error resilience remains unexplored. To this end, we develop a fault injection framework that systematically injects fault into the mixed-precision computation results. We investigate how the faults affect the accuracy of machine learning applications. Based on the error resilience characteristics, we offer lightweight error detection and correction solutions that significantly improve the overall model accuracy if the models experience hardware faults. The solutions can be efficiently integrated into the accelerator's pipelines.
翻译:新兴深度学习工作负载迫切需要快速通用矩阵乘法(GEMM)。为满足这一需求,机器学习专用加速器(如NVIDIA Tensor Cores、AMD Matrix Cores和Google TPU)的关键特性之一是支持混合精度GEMM。对于深度神经网络模型,低精度浮点数据格式和计算能在保证可接受正确性的前提下,显著提升性能、减少面积占用并降低内存开销。尽管前景广阔,但混合精度计算在容错性方面仍属未被探索的领域。为此,我们开发了一个故障注入框架,能够系统地向混合精度计算结果中注入故障,并研究故障如何影响机器学习应用的准确性。基于误差鲁棒性特征,我们提出轻量级错误检测与纠正解决方案,当模型遭遇硬件故障时,可显著提升整体模型准确率。这些方案能高效集成至加速器的流水线中。