With the rapid advancement of image generative models, generative data augmentation has become an effective way to enrich training images, especially when only small-scale datasets are available. At the same time, in practical applications, generative data augmentation can be vulnerable to clean-label backdoor attacks, which aim to bypass human inspection. However, based on theoretical analysis and preliminary experiments, we observe that directly applying existing pixel-level clean-label backdoor attack methods (e.g., COMBAT) to generated images results in low attack success rates. This motivates us to move beyond pixel-level triggers and focus instead on the latent feature level. To this end, we propose InvLBA, an invisible clean-label backdoor attack method for generative data augmentation by latent perturbation. We theoretically prove that the generalization of the clean accuracy and attack success rates of InvLBA can be guaranteed. Experiments on multiple datasets show that our method improves the attack success rate by 46.43% on average, with almost no reduction in clean accuracy and high robustness against SOTA defense methods.
翻译:随着图像生成模型的快速发展,生成式数据增强已成为丰富训练图像的有效手段,尤其在仅能获取小规模数据集时。与此同时,在实际应用中,生成式数据增强可能易受旨在规避人工检查的干净标签后门攻击。然而,基于理论分析和初步实验,我们观察到直接将现有的像素级干净标签后门攻击方法(例如COMBAT)应用于生成图像会导致较低的攻击成功率。这促使我们超越像素级触发器,转而关注潜在特征层面。为此,我们提出InvLBA,一种通过潜在扰动实现的、面向生成式数据增强的隐形干净标签后门攻击方法。我们从理论上证明了InvLBA的干净准确率与攻击成功率的泛化性能可以得到保证。在多个数据集上的实验表明,我们的方法将攻击成功率平均提升了46.43%,同时几乎不降低干净准确率,并对最先进的防御方法具有高度鲁棒性。