Backdoor attacks aim to inject a backdoor into a classifier such that it predicts any input with an attacker-chosen backdoor trigger as an attacker-chosen target class. Existing backdoor attacks require either retraining the classifier with some clean data or modifying the model's architecture. As a result, they are 1) not applicable when clean data is unavailable, 2) less efficient when the model is large, and 3) less stealthy due to architecture changes. In this work, we propose DFBA, a novel retraining-free and data-free backdoor attack without changing the model architecture. Technically, our proposed method modifies a few parameters of a classifier to inject a backdoor. Through theoretical analysis, we verify that our injected backdoor is provably undetectable and unremovable by various state-of-the-art defenses under mild assumptions. Our evaluation on multiple datasets further demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses. Moreover, our comparison with a state-of-the-art non-data-free backdoor attack shows our attack is more stealthy and effective against various defenses while achieving less classification accuracy loss.
翻译:后门攻击旨在向分类器中注入后门,使其将任何带有攻击者选定后门触发器的输入预测为攻击者选定的目标类别。现有的后门攻击需要利用部分干净数据重新训练分类器,或修改模型架构。因此,这些方法存在以下局限:1) 在无法获取干净数据时不可行;2) 当模型规模较大时效率较低;3) 因架构修改而隐蔽性不足。本研究提出DFBA,一种无需重新训练、无需数据且不改变模型架构的新型后门攻击方法。从技术层面,该方法通过修改分类器的少量参数实现后门注入。通过理论分析,我们证明在温和假设条件下,所注入的后门可验证地无法被多种先进防御机制检测或移除。在多数据集上的评估进一步表明,所注入的后门具有以下特性:1) 引入可忽略的分类精度损失;2) 实现100%的攻击成功率;3) 成功规避六种现有先进防御机制。此外,与当前先进的非数据无关后门攻击的对比显示,本方法在实现更低分类精度损失的同时,对各类防御机制表现出更强的隐蔽性和攻击有效性。