Towards Clean-Label Backdoor Attacks in the Physical World

Deep Neural Networks (DNNs) are vulnerable to backdoor poisoning attacks, with most research focusing on digital triggers, special patterns digitally added to test-time inputs to induce targeted misclassification. In contrast, physical triggers, which are natural objects within a physical scene, have emerged as a desirable alternative since they enable real-time backdoor activations without digital manipulation. However, current physical attacks require that poisoned inputs have incorrect labels, making them easily detectable upon human inspection. In this paper, we collect a facial dataset of 21,238 images with 7 common accessories as triggers and use it to study the threat of clean-label backdoor attacks in the physical world. Our study reveals two findings. First, the success of physical attacks depends on the poisoning algorithm, physical trigger, and the pair of source-target classes. Second, although clean-label poisoned samples preserve ground-truth labels, their perceptual quality could be seriously degraded due to conspicuous artifacts in the images. Such samples are also vulnerable to statistical filtering methods because they deviate from the distribution of clean samples in the feature space. To address these issues, we propose replacing the standard $\ell_\infty$ regularization with a novel pixel regularization and feature regularization that could enhance the imperceptibility of poisoned samples without compromising attack performance. Our study highlights accidental backdoor activations as a key limitation of clean-label physical backdoor attacks. This happens when unintended objects or classes accidentally cause the model to misclassify as the target class.

翻译：深度神经网络（DNN）易受后门投毒攻击，现有研究大多关注数字触发器——即在测试时输入中数字添加的特殊模式，以诱导目标误分类。相比之下，物理触发器作为物理场景中的自然物体，已成为一种更理想的替代方案，因为它们无需数字操控即可实现实时后门激活。然而，当前的物理攻击要求被投毒的输入具有错误标签，这使得它们在人工检查时极易被发现。本文收集了一个包含21,238张图像的人脸数据集，其中7种常见配饰作为触发器，并利用该数据集研究物理世界中清洁标签后门攻击的威胁。我们的研究揭示了两点发现：首先，物理攻击的成功取决于投毒算法、物理触发器以及源-目标类别对；其次，尽管清洁标签投毒样本保留了真实标签，但由于图像中存在明显的伪影，其感知质量可能严重下降。此类样本也容易受到统计过滤方法的攻击，因为它们在特征空间中偏离了干净样本的分布。为解决这些问题，我们提出用新颖的像素正则化和特征正则化替代标准的$\ell_\infty$正则化，这可以在不损害攻击性能的前提下增强投毒样本的不可感知性。我们的研究指出，意外后门激活是清洁标签物理后门攻击的一个关键局限。当意外物体或类别偶然导致模型误分类为目标类别时，即会发生此种情况。