Compiled Models, Built-In Exploits: Uncovering Pervasive Bit-Flip Attack Surfaces in DNN Executables

Bit-flip attacks (BFAs) can manipulate deep neural networks (DNNs). For high-level DNN models running on deep learning (DL) frameworks like PyTorch, extensive BFAs have been used to flip bits in model weights and shown effective. Defenses have also been proposed to guard model weights. However, DNNs are increasingly compiled into DNN executables by DL compilers to leverage hardware primitives. These executables manifest distinct computation paradigms; existing research fails to accurately capture and expose the BFA surfaces on DNN executables. To this end, we launch the first systematic study of BFAs on DNN executables. Prior BFAs are limited to attacking model weights and assume a strong whitebox attacker with full knowledge of victim model weights, which is unrealistic as weights are often confidential. In contrast, we find that BFAs on DNN executables can achieve high effectiveness by exploiting the model structure (usually stored in the executable code), which only requires knowing the (often public) model structure. Importantly, such structure-based BFAs are pervasive, transferable, and more severe in DNN executables. They also slip past existing defenses. To demonstrate the new attack surfaces, we assume a weak and more realistic attacker with no knowledge of victim model weights. We design an automated tool to identify vulnerable bits in victim executables with high confidence (70% vs. baseline 2%). We show on DDR4 DRAM that only 1.4 flips on average are needed to fully downgrade the accuracy of victim models, including quantized ones which could require 23x more flips previously, to random guesses. We comprehensively evaluate 16 DNN executables, covering large-scale models trained on commonly-used datasets compiled by the two most popular DL compilers. Our finding calls for incorporating security mechanisms in future DNN compilation toolchains.

翻译：比特翻转攻击（BFA）能够操纵深度神经网络（DNN）。对于在PyTorch等深度学习（DL）框架上运行的高级DNN模型，已有大量BFA被用于翻转模型权重中的比特位，并证明是有效的。同时，也有防御措施被提出来保护模型权重。然而，DNN正越来越多地被DL编译器编译成DNN可执行文件，以利用硬件原语。这些可执行文件呈现出独特的计算范式；现有研究未能准确捕捉和揭示DNN可执行文件上的BFA攻击面。为此，我们首次对DNN可执行文件上的BFA展开了系统性研究。先前的BFA仅限于攻击模型权重，并假设攻击者拥有受害者模型权重的完整知识，是一个强大的白盒攻击者，这并不现实，因为权重通常是保密的。相比之下，我们发现，通过利用模型结构（通常存储在可执行代码中），对DNN可执行文件的BFA能够实现高有效性，而这仅需要知道（通常是公开的）模型结构。重要的是，这种基于结构的BFA在DNN可执行文件中是普遍存在的、可转移的，并且更为严重。它们还能绕过现有的防御措施。为了展示新的攻击面，我们假设了一个较弱且更现实的攻击者，其不了解受害者模型的权重。我们设计了一个自动化工具，以高置信度（70% vs. 基线2%）识别受害者可执行文件中的易受攻击比特位。我们在DDR4 DRAM上证明，平均仅需1.4次翻转即可将受害者模型（包括量化模型，此前可能需要23倍以上的翻转）的准确率完全降至随机猜测水平。我们全面评估了16个DNN可执行文件，涵盖了由两个最流行的DL编译器编译的、在常用数据集上训练的大规模模型。我们的发现呼吁在未来的DNN编译工具链中纳入安全机制。