Generative models such as GANs and diffusion models are widely used to synthesize photorealistic images and to support downstream creative and editing tasks. While adversarial attacks on discriminative models are well studied, attacks targeting generative pipelines where small, stealthy perturbations in inputs lead to controlled changes in outputs are less explored. This study introduces VagueGAN, an attack pipeline combining a modular perturbation network PoisonerNet with a Generator Discriminator pair to craft stealthy triggers that cause targeted changes in generated images. Attack efficacy is evaluated using a custom proxy metric, while stealth is analyzed through perceptual and frequency domain measures. The transferability of the method to a modern diffusion based pipeline is further examined through ControlNet guided editing. Interestingly, the experiments show that poisoned outputs can display higher visual quality compared to clean counterparts, challenging the assumption that poisoning necessarily reduces fidelity. Unlike conventional pixel level perturbations, latent space poisoning in GANs and diffusion pipelines can retain or even enhance output aesthetics, exposing a blind spot in pixel level defenses. Moreover, carefully optimized perturbations can produce consistent, stealthy effects on generator outputs while remaining visually inconspicuous, raising concerns for the integrity of image generation pipelines.
翻译:生成对抗网络(GAN)和扩散模型等生成模型被广泛用于合成逼真图像,并支持下游的创意与编辑任务。尽管针对判别式模型的对抗攻击已得到充分研究,但针对生成流程的攻击——即输入中微小、隐蔽的扰动导致输出发生受控变化——则较少被探索。本研究提出VagueGAN,这是一种结合了模块化扰动网络PoisonerNet与生成器-判别器对的攻击流程,旨在构建能够导致生成图像发生定向变化的隐蔽触发器。攻击效果通过定制的代理指标进行评估,而隐蔽性则通过感知域和频域度量进行分析。进一步通过ControlNet引导的编辑,检验了该方法在现代基于扩散的流程中的可迁移性。有趣的是,实验表明,被投毒的输出可能展现出比干净样本更高的视觉质量,这挑战了"投毒必然降低保真度"的假设。与传统的像素级扰动不同,在GAN和扩散流程中进行潜在空间投毒可以保持甚至提升输出的美学效果,这暴露了像素级防御的一个盲区。此外,精心优化的扰动可以在生成器输出上产生一致且隐蔽的效果,同时保持视觉上的不显眼,这对图像生成流程的完整性提出了关切。