Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them inevitably and increasingly susceptible to hardware faults (e.g., bit flips) that can potentially corrupt model parameters. Given this challenge, this paper aims to answer a critical question: How likely is a parameter corruption to result in an incorrect model output? To systematically answer this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model resilience/vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. Similar to AVF, this statistical concept can be derived from statistically extensive and meaningful fault injection (FI) experiments. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT). PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice.
翻译:人工智能系统的可靠性是成功部署和广泛应用AI技术的根本前提。然而,AI硬件系统日益增长的复杂性和异构性使其不可避免地面临日益严重的硬件故障(如比特翻转)风险,这些故障可能损坏模型参数。面对这一挑战,本文旨在回答一个关键问题:参数损坏有多大概率导致模型输出错误?为系统性地回答该问题,我们受计算机体系结构领域架构脆弱性因子(AVF)的启发,提出了一种新型定量指标——参数脆弱性因子(PVF),旨在标准化AI模型对参数损坏的韧性/脆弱性评估。我们将模型参数的PVF定义为:该特定参数损坏导致输出错误的概率。与AVF类似,这一统计概念可通过具有统计学意义且大规模的故障注入(FI)实验推导得出。本文展示了将PVF应用于推理阶段三类任务/模型(推荐模型DLRM、视觉分类模型CNN和文本分类模型BERT)的多个用例。PVF可为AI硬件设计者在平衡故障防护与性能/效率的权衡(例如将脆弱AI参数组件映射到受良好保护的硬件模块)时提供关键洞察。该指标适用于任何AI模型,并有望帮助统一并标准化AI脆弱性/韧性评估实践。