Deep neural networks (DNNs) are widely deployed on real-world devices. Concerns regarding their security have gained great attention from researchers. Recently, a new weight modification attack called bit flip attack (BFA) was proposed, which exploits memory fault inject techniques such as row hammer to attack quantized models in the deployment stage. With only a few bit flips, the target model can be rendered useless as a random guesser or even be implanted with malicious functionalities. In this work, we seek to further reduce the number of bit flips. We propose a training-assisted bit flip attack, in which the adversary is involved in the training stage to build a high-risk model to release. This high-risk model, obtained coupled with a corresponding malicious model, behaves normally and can escape various detection methods. The results on benchmark datasets show that an adversary can easily convert this high-risk but normal model to a malicious one on victim's side by \textbf{flipping only one critical bit} on average in the deployment stage. Moreover, our attack still poses a significant threat even when defenses are employed. The codes for reproducing main experiments are available at \url{https://github.com/jianshuod/TBA}.
翻译:深度神经网络(DNNs)广泛应用于现实世界的设备中,其安全性问题已引起研究者的极大关注。近期,一种名为比特翻转攻击(BFA)的新型权重修改攻击被提出,该攻击利用行锤等内存故障注入技术,在部署阶段攻击量化模型。仅需翻转少量比特,目标模型就会沦为随机猜测器,甚至被植入恶意功能。本研究致力于进一步减少所需翻转的比特数量。我们提出了一种训练辅助的比特翻转攻击,攻击者通过参与训练阶段构建并发布一个高风险模型。该高风险模型与对应的恶意模型协同获取,其行为表现正常,能够绕过多种检测方法。基准数据集上的实验结果表明,攻击者在部署阶段平均仅需翻转一个关键比特,即可轻松将受害者侧的高风险正常模型转化为恶意模型。此外,即便采用防御措施,我们的攻击仍构成重大威胁。复现主要实验的代码已公开于\url{https://github.com/jianshuod/TBA}。