Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Recent deep neural networks (DNNs) have come to rely on vast amounts of training data, providing an opportunity for malicious attackers to exploit and contaminate the data to carry out backdoor attacks. These attacks significantly undermine the reliability of DNNs. However, existing backdoor attack methods make unrealistic assumptions, assuming that all training data comes from a single source and that attackers have full access to the training data. In this paper, we address this limitation by introducing a more realistic attack scenario where victims collect data from multiple sources, and attackers cannot access the complete training data. We refer to this scenario as data-constrained backdoor attacks. In such cases, previous attack methods suffer from severe efficiency degradation due to the entanglement between benign and poisoning features during the backdoor injection process. To tackle this problem, we propose a novel approach that leverages the pre-trained Contrastive Language-Image Pre-Training (CLIP) model. We introduce three CLIP-based technologies from two distinct streams: Clean Feature Suppression, which aims to suppress the influence of clean features to enhance the prominence of poisoning features, and Poisoning Feature Augmentation, which focuses on augmenting the presence and impact of poisoning features to effectively manipulate the model's behavior. To evaluate the effectiveness, harmlessness to benign accuracy, and stealthiness of our method, we conduct extensive experiments on 3 target models, 3 datasets, and over 15 different settings. The results demonstrate remarkable improvements, with some settings achieving over 100% improvement compared to existing attacks in data-constrained scenarios. Our research contributes to addressing the limitations of existing methods and provides a practical and effective solution for data-constrained backdoor attacks.

翻译：近期深度神经网络(DNNs)严重依赖海量训练数据，这为恶意攻击者利用并污染数据实施后门攻击提供了可乘之机。此类攻击严重削弱了DNNs的可靠性。然而，现有后门攻击方法存在不切实际的假设，假定所有训练数据均来自单一来源，且攻击者能完全访问训练数据。本文通过引入更符合实际的攻击场景来突破这一局限：受害者从多个数据源收集数据，而攻击者无法获取完整训练数据。我们将此场景称为数据受限后门攻击。在此类情形下，由于后门注入过程中良性特征与中毒特征相互纠缠，现有攻击方法的效率严重下降。为解决该问题，我们提出一种利用预训练对比语言-图像预训练(CLIP)模型的新方法。我们引入两种不同技术流派的三种CLIP技术：清洁特征抑制（旨在抑制清洁特征影响以增强中毒特征显著性）和中毒特征增强（侧重强化中毒特征的存在性与影响力以有效操控模型行为）。为评估所提方法的有效性、对良性精度的无害性及隐蔽性，我们在3个目标模型、3个数据集及超过15种不同设置下开展了广泛实验。结果表明，在数据受限场景中，部分设置相比现有攻击方法实现了超过100%的性能提升。本研究弥补了现有方法的局限性，为数据受限后门攻击提供了实用高效的解决方案。