Machine learning models are routinely deployed on a wide range of computing hardware. Although such hardware is typically expected to produce identical results, differences in its design can lead to small numerical variations during inference. In this work, we show that these variations can be exploited to create backdoors in machine learning models. The core idea is to shape the model's decision function such that it yields different predictions for the same input when executed on different hardware. This effect is achieved by locally moving the decision boundary close to a target input and then refining numerical deviations to flip the prediction on selected hardware. We empirically demonstrate that these hardware-triggered backdoors can be created reliably across common GPU accelerators. Our findings reveal a novel attack vector affecting the use of third-party models, and we investigate different defenses to counter this threat.
翻译:机器学习模型被广泛部署于各种计算硬件上。尽管此类硬件通常被预期产生相同的结果,但其设计差异可能导致推理过程中出现微小的数值变化。在本研究中,我们证明这些变化可被利用以在机器学习模型中创建后门。其核心思想是通过调整模型的决策函数,使得同一输入在不同硬件上执行时产生不同的预测结果。这一效果是通过将决策边界局部移动至目标输入附近,并优化数值偏差以在选定硬件上翻转预测来实现的。我们通过实验证明,这些硬件触发的后门可以在常见的GPU加速器上被可靠地创建。我们的研究揭示了一种影响第三方模型使用的新型攻击向量,并探讨了应对此类威胁的不同防御策略。