Vision Transformers (ViTs) that leverage self-attention mechanism have shown superior performance on many classical vision tasks compared to convolutional neural networks (CNNs) and gain increasing popularity recently. Existing ViTs works mainly optimize performance and accuracy, but ViTs reliability issues induced by hardware faults in large-scale VLSI designs have generally been overlooked. In this work, we mainly study the reliability of ViTs and investigate the vulnerability from different architecture granularities ranging from models, layers, modules, and patches for the first time. The investigation reveals that ViTs with the self-attention mechanism are generally more resilient on linear computing including general matrix-matrix multiplication (GEMM) and full connection (FC), and show a relatively even vulnerability distribution across the patches. However, ViTs involve more fragile non-linear computing such as softmax and GELU compared to typical CNNs. With the above observations, we propose an adaptive algorithm-based fault tolerance algorithm (ABFT) to protect the linear computing implemented with distinct sizes of GEMM and apply a range-based protection scheme to mitigate soft errors in non-linear computing. According to our experiments, the proposed fault-tolerant approaches enhance ViT accuracy significantly with minor computing overhead in presence of various soft errors.
翻译:视觉Transformer(Vision Transformers, ViTs)通过利用自注意力机制,在诸多经典视觉任务中展现了优于卷积神经网络(CNNs)的性能,近年来逐渐受到广泛关注。现有ViT研究主要侧重于优化性能与准确率,但由大规模VLSI设计中的硬件故障所引发的ViT可靠性问题普遍被忽视。本文首次从模型、层、模块及图像块(patch)等不同架构粒度,系统研究了ViT的可靠性及其脆弱性分布。研究表明,具有自注意力机制的ViT在包括通用矩阵乘法(GEMM)和全连接(FC)在内的线性计算上通常更具鲁棒性,且各图像块的脆弱性分布相对均匀。然而,与典型CNN相比,ViT包含更多脆弱的非线性计算单元(如softmax和GELU)。基于上述观察,我们提出了一种自适应算法级容错(ABFT)方案,用于保护以不同规模GEMM实现的线性计算,并采用基于范围的保护策略来缓解非线性计算中的软错误。实验结果表明,在多种软错误存在的情况下,所提出的容错方法能以极小的计算开销显著提升ViT的准确率。