Vision Transformers (ViTs) with outstanding performance becomes a popular backbone of deep learning models for the main-stream vision tasks including classification, object detection, and segmentation. Other than the performance, reliability is also a critical metric for the adoption of ViTs in safety-critical applications such as autonomous driving and robotics. With the observation that the major computing blocks in ViTs such as multi-head attention and feed forward are usually performed with general matrix multiplication (GEMM), we propose to adopt a classical algorithm-based fault tolerance (ABFT) strategy originally developed for GEMM to protect ViTs against soft errors in the underlying computing engines. Unlike classical ABFT that will invoke the expensive error recovery procedure whenever computing errors are detected, we leverage the inherent fault-tolerance of ViTs and propose an approximate ABFT, namely ApproxABFT, to invoke the error recovery procedure only when the computing errors are significant enough, which skips many useless error recovery procedures and simplifies the overall GEMM error recovery. According to our experiments, ApproxABFT reduces the computing overhead by 25.92% to 81.62% and improves the model accuracy by 2.63% to 72.56% compared to the baseline ABFT.
翻译:摘要:视觉Transformer(ViT)凭借卓越性能,已成为主流视觉任务(包括分类、目标检测和分割)中深度学习模型的重要基础架构。除性能外,可靠性同样是决定ViT能否应用于自动驾驶、机器人等安全关键领域的关键指标。基于ViT中多头注意力机制和前馈网络等核心计算模块通常采用通用矩阵乘法(GEMM)实现的观察,我们提出采用一种经典算法级容错(ABFT)策略(最初为GEMM设计)来保护ViT免受底层计算引擎中软错误的影响。与经典ABFT在检测到计算错误时立即启动高开销错误恢复流程不同,我们利用ViT自身的固有容错能力,提出一种近似ABFT方法(即ApproxABFT),仅在计算错误足够显著时触发错误恢复流程。该方法跳过了大量无效的错误恢复操作,简化了整体GEMM错误恢复机制。实验表明,与基准ABFT相比,ApproxABFT将计算开销降低了25.92%至81.62%,模型准确率提升了2.63%至72.56%。