State-of-the-art deep neural networks (DNNs) have been proven to be vulnerable to adversarial manipulation and backdoor attacks. Backdoored models deviate from expected behavior on inputs with predefined triggers while retaining performance on clean data. Recent works focus on software simulation of backdoor injection during the inference phase by modifying network weights, which we find often unrealistic in practice due to restrictions in hardware. In contrast, in this work for the first time, we present an end-to-end backdoor injection attack realized on actual hardware on a classifier model using Rowhammer as the fault injection method. To this end, we first investigate the viability of backdoor injection attacks in real-life deployments of DNNs on hardware and address such practical issues in hardware implementation from a novel optimization perspective. We are motivated by the fact that vulnerable memory locations are very rare, device-specific, and sparsely distributed. Consequently, we propose a novel network training algorithm based on constrained optimization to achieve a realistic backdoor injection attack in hardware. By modifying parameters uniformly across the convolutional and fully-connected layers as well as optimizing the trigger pattern together, we achieve state-of-the-art attack performance with fewer bit flips. For instance, our method on a hardware-deployed ResNet-20 model trained on CIFAR-10 achieves over 89% test accuracy and 92% attack success rate by flipping only 10 out of 2.2 million bits.
翻译:当前最先进的深度神经网络已被证明易受对抗性操纵和后门攻击。带有后门的模型在含有预定义触发器的输入上会偏离预期行为,但在干净数据上仍能保持性能。近期研究主要聚焦于推理阶段通过修改网络权重的软件模拟后门注入,但我们发现由于硬件限制,此类方法在实践中往往不切实际。与此不同,本研究首次在真实硬件上对分类器模型实现了一种端到端的后门注入攻击,采用Rowhammer作为故障注入方法。为此,我们首先探讨了DNN硬件实际部署中后门注入攻击的可行性,并从全新的优化视角解决了硬件实现中的实际难题。我们的研究动机源于易受攻击的内存位置极为罕见、具有设备特异性且分布稀疏这一事实。因此,我们提出了一种基于约束优化的新型网络训练算法,以实现硬件中的真实后门注入攻击。通过跨卷积层和全连接层统一参数修改,并联合优化触发器模式,我们以更少的比特翻转实现了最先进的攻击性能。例如,在基于CIFAR-10训练的ResNet-20硬件部署模型上,我们的方法仅需翻转220万比特中的10个,即可达到超过89%的测试准确率和92%的攻击成功率。