The proliferation of autoregressive (AR) image generators demands reliable detection and attribution of their outputs to mitigate misinformation, and to filter synthetic images from training data to prevent model collapse. To address this need, watermarking techniques, specifically designed for AR models, embed a subtle signal at generation time, enabling downstream verification through a corresponding watermark detector. In this work, we study these schemes and demonstrate their vulnerability to both watermark removal and forgery attacks. We assess existing attacks and further introduce three new attacks: (i) a vector-quantized regeneration removal attack, (ii) adversarial optimization-based attack, and (iii) a frequency injection attack. Our evaluation reveals that removal and forgery attacks can be effective with access to a single watermarked reference image and without access to original model parameters or watermarking secrets. Our findings indicate that existing watermarking schemes for AR image generation do not reliably support synthetic content detection for dataset filtering. Moreover, they enable Watermark Mimicry, whereby authentic images can be manipulated to imitate a generator's watermark and trigger false detection to prevent their inclusion in future model training.
翻译:自回归(AR)图像生成器的泛滥要求对其输出进行可靠的检测与归属,以减少虚假信息,并过滤训练数据中的合成图像,从而防止模型崩溃。为满足这一需求,专门为AR模型设计的水印技术通过在生成时嵌入微弱信号,使下游验证可通过对应的水印检测器实现。本研究对这些方案进行了分析,揭示了其在面对水印移除和伪造攻击时的脆弱性。我们评估了现有攻击方法,并进一步引入了三种新型攻击:(i) 基于向量量化的再生移除攻击,(ii) 基于对抗优化的攻击,以及(iii) 频率注入攻击。我们的评估表明,在仅拥有单个带水印参考图像、且无需访问原始模型参数或水印机密的情况下,移除和伪造攻击可有效实施。研究发现,现有的AR图像生成水印方案无法可靠支持用于数据集过滤的合成内容检测。此外,这些方案还使得“水印模仿”成为可能——真实图像可被操纵以模仿生成器的水印,从而触发误检,阻止其被纳入未来模型训练。