Evil from Within: Machine Learning Backdoors through Hardware Trojans

Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars. While different defenses have been proposed to address this threat, they all rely on the assumption that the hardware on which the learning models are executed during inference is trusted. In this paper, we challenge this assumption and introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning. Outside of the accelerator, neither the learning model nor the software is manipulated, so that current defenses fail. To make this attack practical, we overcome two challenges: First, as memory on a hardware accelerator is severely limited, we introduce the concept of a minimal backdoor that deviates as little as possible from the original model and is activated by replacing a few model parameters only. Second, we develop a configurable hardware trojan that can be provisioned with the backdoor and performs a replacement only when the specific target model is processed. We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU, a commercial machine-learning accelerator. We configure the trojan with a minimal backdoor for a traffic-sign recognition system. The backdoor replaces only 30 (0.069%) model parameters, yet it reliably manipulates the recognition once the input contains a backdoor trigger. Our attack expands the hardware circuit of the accelerator by 0.24% and induces no run-time overhead, rendering a detection hardly possible. Given the complex and highly distributed manufacturing process of current hardware, our work points to a new threat in machine learning that is inaccessible to current security mechanisms and calls for hardware to be manufactured only in fully trusted environments.

翻译：后门攻击对机器学习构成严重威胁，可能损害自动驾驶汽车等安全关键系统的完整性。尽管已有多种防御方法被提出以应对这一威胁，但它们均依赖于一个假设：推理过程中执行学习模型的硬件是可信的。本文挑战了这一假设，提出一种完全驻留在通用机器学习硬件加速器中的后门攻击。在加速器之外，无论是学习模型还是软件均未受到操控，因此现有防御机制均无法检测。为使该攻击具备实用性，我们克服了两个挑战：首先，由于硬件加速器内存极为有限，我们引入了最小后门的概念，即尽可能少地偏离原始模型，仅通过替换极少数模型参数即可激活后门。其次，我们开发了一种可配置的硬件木马，该木马可预置后门，并仅在处理特定目标模型时执行参数替换操作。我们通过将硬件木马植入Xilinx Vitis AI DPU（一款商用机器学习加速器）来证明攻击的实践可行性。我们针对交通标志识别系统配置了含最小后门的木马。该后门仅替换30个（0.069%）模型参数，但一旦输入包含后门触发器，便能可靠地操控识别结果。我们的攻击使加速器的硬件电路面积增加0.24%，且不引入运行时开销，使得检测几乎无法实现。鉴于当前硬件制造过程的复杂性和高度分布性，本研究指出了机器学习领域的一种新型威胁——现有安全机制对此无能为力，并呼吁硬件仅在完全可信的环境中制造。