Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking

With the success of autoregressive learning in large language models, it has become a dominant approach for text-to-image generation, offering high efficiency and visual quality. However, invisible watermarking for visual autoregressive (VAR) models remains underexplored, despite its importance in misuse prevention. Existing watermarking methods, designed for diffusion models, often struggle to adapt to the sequential nature of VAR models. To bridge this gap, we propose Safe-VAR, the first watermarking framework specifically designed for autoregressive text-to-image generation. Our study reveals that the timing of watermark injection significantly impacts generation quality, and watermarks of different complexities exhibit varying optimal injection times. Motivated by this observation, we propose an Adaptive Scale Interaction Module, which dynamically determines the optimal watermark embedding strategy based on the watermark information and the visual characteristics of the generated image. This ensures watermark robustness while minimizing its impact on image quality. Furthermore, we introduce a Cross-Scale Fusion mechanism, which integrates mixture of both heads and experts to effectively fuse multi-resolution features and handle complex interactions between image content and watermark patterns. Experimental results demonstrate that Safe-VAR achieves state-of-the-art performance, significantly surpassing existing counterparts regarding image quality, watermarking fidelity, and robustness against perturbations. Moreover, our method exhibits strong generalization to an out-of-domain watermark dataset QR Codes.

翻译：随着自回归学习在大型语言模型中的成功，它已成为文本到图像生成的主导方法，提供了高效率和卓越的视觉质量。然而，尽管其在防止滥用方面具有重要意义，针对视觉自回归（VAR）模型的不可见水印技术仍未得到充分探索。现有的水印方法主要为扩散模型设计，通常难以适应VAR模型的序列生成特性。为弥补这一空白，我们提出了Safe-VAR，这是首个专门为自回归文本到图像生成设计的水印框架。我们的研究表明，水印注入的时机对生成质量有显著影响，且不同复杂度的水印表现出不同的最优注入时机。基于这一观察，我们提出了一个自适应尺度交互模块，该模块根据水印信息和生成图像的视觉特征动态确定最优的水印嵌入策略。这确保了水印的鲁棒性，同时最小化其对图像质量的影响。此外，我们引入了跨尺度融合机制，该机制整合了多头和专家混合策略，以有效融合多分辨率特征并处理图像内容与水印模式之间的复杂交互。实验结果表明，Safe-VAR实现了最先进的性能，在图像质量、水印保真度以及对扰动的鲁棒性方面显著超越了现有方法。此外，我们的方法在域外水印数据集（如QR码）上表现出强大的泛化能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/