With the growth of Vision Transformers in safety-critical domains like autonomous systems and medical imaging, ensuring their reliability against soft errors is paramount. While ViTs offer state-of-the-art accuracy, their massive parameter counts render exhaustive fault injection campaigns infeasible. To bridge this gap, a statistical fault injection framework is presented, leveraging finite-population sampling theory to provide formal reliability guarantees. It is demonstrated that failure rates are bounded within a 1% margin at 99\% confidence using only a few thousand samples, regardless of model scale. This methodology achieves up to a 10,700 times reduction in experimental cost compared to exhaustive approaches, while preserving the ability to localize vulnerabilities across architectural components. Through extensive evaluation of different architectures like ViT-Tiny and ViT-Small, a highly non-uniform reliability landscape is uncovered. It is shown that while only 3% of FP32 bit-flips result in failure, the vast majority of these events lead to catastrophic accuracy collapse. Specific vulnerabilities are localized to normalization layers and critical exponent bits within the IEEE-754 format, providing a mathematical foundation and actionable insights for the design of hardened, edge-deployed ViT architectures.
翻译:摘要:随着视觉Transformer在自动驾驶系统、医学成像等安全关键领域的应用日益增长,确保其对软错误的可靠性至关重要。尽管ViT具有最先进的准确率,但其海量参数使得穷尽式故障注入攻击不可行。为弥合这一差距,本文提出一种统计故障注入框架,利用有限总体抽样理论在形式化层面提供可靠性保证。研究表明,无论模型规模如何,仅需数千个样本即可在99%置信度下将故障率限制在1%的误差范围内。与穷尽式方法相比,该方法实现高达10,700倍的实验成本缩减,同时保留跨架构组件定位脆弱性的能力。通过对ViT-Tiny和ViT-Small等不同架构的广泛评估,揭示了高度非均匀的可靠性分布。结果显示,尽管仅3%的FP32位翻转导致故障,但这些事件中绝大多数(占极大比例)引发灾难性的准确性坍塌。具体脆弱性被定位至归一化层及IEEE-754格式中的关键指数位,为设计加固型边缘部署ViT架构提供了数学基础与可操作见解。