In the rapidly evolving landscape of neural network security, the resilience of neural networks against bit-flip attacks (i.e., an attacker maliciously flips an extremely small amount of bits within its parameter storage memory system to induce harmful behavior), has emerged as a relevant area of research. Existing studies suggest that quantization may serve as a viable defense against such attacks. Recognizing the documented susceptibility of real-valued neural networks to such attacks and the comparative robustness of quantized neural networks (QNNs), in this work, we introduce BFAVerifier, the first verification framework designed to formally verify the absence of bit-flip attacks or to identify all vulnerable parameters in a sound and rigorous manner. BFAVerifier comprises two integral components: an abstraction-based method and an MILP-based method. Specifically, we first conduct a reachability analysis with respect to symbolic parameters that represent the potential bit-flip attacks, based on a novel abstract domain with a sound guarantee. If the reachability analysis fails to prove the resilience of such attacks, then we encode this verification problem into an equivalent MILP problem which can be solved by off-the-shelf solvers. Therefore, BFAVerifier is sound, complete, and reasonably efficient. We conduct extensive experiments, which demonstrate its effectiveness and efficiency across various network architectures, quantization bit-widths, and adversary capabilities.
翻译:在神经网络安全快速发展的背景下,神经网络对抗比特翻转攻击(即攻击者恶意翻转其参数存储内存系统中极少量的比特位以诱发有害行为)的鲁棒性已成为一个重要研究领域。现有研究表明,量化可能作为抵御此类攻击的有效手段。认识到已有文献记载的实值神经网络对此类攻击的脆弱性以及量化神经网络(QNNs)相对较强的鲁棒性,本文提出了BFAVerifier——首个旨在形式化验证比特翻转攻击不存在性或以严谨可靠方式识别所有脆弱参数的验证框架。BFAVerifier包含两个核心组件:基于抽象的方法和基于混合整数线性规划(MILP)的方法。具体而言,我们首先基于具有可靠性保证的新型抽象域,对表征潜在比特翻转攻击的符号参数进行可达性分析。若可达性分析无法证明此类攻击的抵抗性,则将该验证问题编码为等价的MILP问题,并可通过现成求解器进行求解。因此,BFAVerifier具备可靠性、完备性及合理的高效性。我们进行了大量实验,结果证明了该框架在不同网络架构、量化比特宽度和攻击者能力条件下的有效性与高效性。