Evil from Within: Machine Learning Backdoors through Hardware Trojans

Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars. While different defenses have been proposed to address this threat, they all rely on the assumption that the hardware on which the learning models are executed during inference is trusted. In this paper, we challenge this assumption and introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning. Outside of the accelerator, neither the learning model nor the software is manipulated, so that current defenses fail. To make this attack practical, we overcome two challenges: First, as memory on a hardware accelerator is severely limited, we introduce the concept of a minimal backdoor that deviates as little as possible from the original model and is activated by replacing a few model parameters only. Second, we develop a configurable hardware trojan that can be provisioned with the backdoor and performs a replacement only when the specific target model is processed. We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU, a commercial machine-learning accelerator. We configure the trojan with a minimal backdoor for a traffic-sign recognition system. The backdoor replaces only 30 (0.069%) model parameters, yet it reliably manipulates the recognition once the input contains a backdoor trigger. Our attack expands the hardware circuit of the accelerator by 0.24% and induces no run-time overhead, rendering a detection hardly possible. Given the complex and highly distributed manufacturing process of current hardware, our work points to a new threat in machine learning that is inaccessible to current security mechanisms and calls for hardware to be manufactured only in fully trusted environments.

翻译：后门对机器学习构成严重威胁，可能危及自动驾驶汽车等安全关键系统的完整性。尽管已有多种防御措施应对这一威胁，但这些方法均建立在推理阶段执行学习模型的硬件可信这一假设之上。本文质疑该假设，并提出一种完全驻留在机器学习通用硬件加速器内部的后门攻击。除加速器外，学习模型和软件均未被篡改，导致现有防御手段失效。为使该攻击具备实用性，我们攻克了两大挑战：首先，针对硬件加速器内存严格受限的问题，我们提出最小后门概念——其尽可能保持与原模型一致，仅通过替换少数模型参数即可激活；其次，我们开发出可配置硬件木马，该木马可预载后门参数，且仅在处理特定目标模型时执行参数替换。通过在赛灵思Vitis AI DPU（商业机器学习加速器）中植入硬件木马，我们验证了攻击的实践可行性。面向交通标志识别系统，我们配置了仅替换30个（占比0.069%）模型参数的最小后门，但一旦输入包含后门触发器，识别结果即被可靠操控。该攻击使加速器硬件电路面积增加0.24%，且不引入运行时开销，导致检测几乎不可能实现。鉴于当前硬件复杂且高度分散的制造流程，本工作揭示了现有安全机制无法触及的新型机器学习威胁，并呼吁仅在完全可信环境中制造硬件。