The rapid progress of graph generation has raised new security concerns, particularly regarding backdoor vulnerabilities. While prior work has explored backdoor attacks in image diffusion and unconditional graph generation, conditional, especially text-guided graph generation remains largely unexamined. This paper proposes BadGraph, a backdoor attack method targeting latent diffusion models for text-guided graph generation. BadGraph leverages textual triggers to poison training data, covertly implanting backdoors that induce attacker-specified subgraphs during inference when triggers appear, while preserving normal performance on clean inputs. Extensive experiments on four benchmark datasets (PubChem, ChEBI-20, PCDes, MoMu) demonstrate the effectiveness and stealth of the attack: less than 10% poisoning rate can achieves 50% attack success rate, while 24% suffices for over 80% success rate, with negligible performance degradation on benign samples. Ablation studies further reveal that the backdoor is implanted during VAE and diffusion training rather than pretraining. These findings reveal the security vulnerabilities in latent diffusion models of text-guided graph generation, highlight the serious risks in models' applications such as drug discovery and underscore the need for robust defenses against the backdoor attack in such diffusion models.
翻译:图生成技术的快速发展引发了新的安全担忧,特别是关于后门漏洞的问题。尽管先前的研究已探索了图像扩散和无条件图生成中的后门攻击,但条件性(尤其是文本引导的图生成)领域在很大程度上仍未得到检验。本文提出BadGraph,一种针对文本引导图生成的潜在扩散模型的后门攻击方法。BadGraph利用文本触发器污染训练数据,在推理阶段当触发器出现时,隐蔽地植入后门以诱导攻击者指定的子图,同时在干净输入上保持正常性能。在四个基准数据集(PubChem、ChEBI-20、PCDes、MoMu)上的大量实验证明了该攻击的有效性和隐蔽性:低于10%的投毒率即可实现50%的攻击成功率,而24%的投毒率足以达到超过80%的成功率,且在良性样本上的性能下降可忽略不计。消融研究进一步揭示,后门是在VAE和扩散训练阶段而非预训练阶段植入的。这些发现揭示了文本引导图生成的潜在扩散模型中的安全漏洞,突显了模型在药物发现等应用中存在的严重风险,并强调了对此类扩散模型中的后门攻击进行鲁棒防御的必要性。