The rapid progress of graph generation has raised new security concerns, particularly regarding backdoor vulnerabilities. Though prior work has explored backdoor attacks against diffusion models for image or unconditional graph generation, those against conditional graph generation models, especially text-guided graph generation models, remain largely unexamined. This paper proposes BadGraph, a backdoor attack method against latent diffusion models for text-guided graph generation. BadGraph leverages textual triggers to poison training data, covertly implanting backdoors that induce attacker-specified subgraphs during inference when triggers appear, while preserving normal performance on clean inputs. Extensive experiments on four benchmark datasets (PubChem, ChEBI-20, PCDes, MoMu) demonstrate the effectiveness and stealth of the attack: a poisoning rate of less than 10% can achieve a 50% attack success rate, while 24% suffices for over an 80% success rate, with negligible performance degradation on benign samples. Ablation studies further reveal that the backdoor is implanted during VAE and diffusion training rather than pretraining. These findings reveal the security vulnerabilities in latent diffusion models for text-guided graph generation, highlight the serious risks in applications such as drug discovery, and underscore the need for robust defenses against the backdoor attack in such diffusion models.
翻译:图生成的快速发展引发了新的安全担忧,尤其是后门漏洞问题。尽管已有研究探索了针对图像或无约束图生成的扩散模型后门攻击,但对条件图生成模型(尤其是文本引导图生成模型)的攻击尚未得到充分研究。本文提出BadGraph,一种针对文本引导图生成的潜在扩散模型后门攻击方法。BadGraph利用文本触发器污染训练数据,隐蔽地植入后门,使模型在推理时遇到触发器即生成攻击者指定的子图,同时保持干净输入下的正常性能。在四个基准数据集(PubChem、ChEBI-20、PCDes、MoMu)上的大量实验证明了该攻击的有效性与隐蔽性:低于10%的污染率即可实现50%的攻击成功率,24%的污染率即可达成超过80%的成功率,且对良性样本的性能退化可忽略不计。消融研究进一步揭示,后门是在VAE和扩散训练阶段植入的,而非预训练阶段。这些发现揭示了文本引导图生成的潜在扩散模型中的安全漏洞,突显了在药物发现等应用中的严重风险,并强调了对此类扩散模型中后门攻击建立强健防御的必要性。