We propose a novel algorithm, Salient Conditional Diffusion (Sancdifi), a state-of-the-art defense against backdoor attacks. Sancdifi uses a denoising diffusion probabilistic model (DDPM) to degrade an image with noise and then recover said image using the learned reverse diffusion. Critically, we compute saliency map-based masks to condition our diffusion, allowing for stronger diffusion on the most salient pixels by the DDPM. As a result, Sancdifi is highly effective at diffusing out triggers in data poisoned by backdoor attacks. At the same time, it reliably recovers salient features when applied to clean data. This performance is achieved without requiring access to the model parameters of the Trojan network, meaning Sancdifi operates as a black-box defense.
翻译:我们提出了一种新颖算法——显著条件扩散(Sancdifi),这是一种针对后门攻击的最先进防御方法。Sancdifi利用去噪扩散概率模型(DDPM)向图像添加噪声,然后通过学习的逆向扩散过程恢复该图像。关键在于,我们计算基于显著性图的掩模来约束扩散过程,使得DDPM能够对最显著的像素施加更强的扩散效应。因此,Sancdifi能高效扩散后门攻击数据中植入的触发器;同时,应用于干净数据时,它也能可靠恢复显著特征。该性能无需访问木马网络的模型参数即可实现,这意味着Sancdifi是一种黑盒防御方法。