Deep neural networks are vulnerable to backdoor attacks, a type of adversarial attack that poisons the training data to manipulate the behavior of models trained on such data. Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data. Early works on clean-label attacks added triggers to a random subset of the training set, ignoring the fact that samples contribute unequally to the attack's success. This results in high poisoning rates and low attack success rates. To alleviate the problem, several supervised learning-based sample selection strategies have been proposed. However, these methods assume access to the entire labeled training set and require training, which is expensive and may not always be practical. This work studies a new and more practical (but also more challenging) threat model where the attacker only provides data for the target class (e.g., in face recognition systems) and has no knowledge of the victim model or any other classes in the training set. We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate in this setting. Our threat model poses a serious threat in training machine learning models with third-party datasets, since the attack can be performed effectively with limited information. Experiments on benchmark datasets illustrate the effectiveness of our strategies in improving clean-label backdoor attacks.
翻译:深度神经网络易受后门攻击,这类对抗性攻击通过污染训练数据来操控基于此类数据训练的模型行为。干净标签攻击是一种更为隐蔽的后门攻击形式,其能在不改变污染数据标签的情况下实施攻击。早期干净标签攻击研究将触发器随机添加至训练集的子集中,忽略了不同样本对攻击成功贡献度存在差异的事实,导致高污染率与低攻击成功率并存。为缓解该问题,学界已提出若干基于监督学习的样本选择策略。然而,这些方法需获取完整标注训练集并进行模型训练,成本高昂且往往难以实际应用。本研究探讨一种新颖且更实用(同时更具挑战性)的威胁模型:攻击者仅提供目标类别的数据(例如人脸识别系统中),且对受害者模型及训练集中其他类别一无所知。我们系统研究了在该设定下,通过选择性污染目标类别中少量训练样本来提升攻击成功率的不同策略。该威胁模型对使用第三方数据集训练机器学习模型构成严重威胁,因其能在有限信息条件下高效实施攻击。基准数据集上的实验验证了我们所提策略在提升干净标签后门攻击效能方面的有效性。