Digital watermarking is essential for securing generated images from diffusion models. Accurate watermark evaluation is critical for algorithm development, yet existing methods have significant limitations: they lack a unified framework for both residual and semantic watermarks, provide results without interpretability, neglect comprehensive security considerations, and often use inappropriate metrics for semantic watermarks. To address these gaps, we propose WMVLM, the first unified and interpretable evaluation framework for diffusion model image watermarking via vision-language models (VLMs). We redefine quality and security metrics for each watermark type: residual watermarks are evaluated by artifact strength and erasure resistance, while semantic watermarks are assessed through latent distribution shifts. Moreover, we introduce a three-stage training strategy to progressively enable the model to achieve classification, scoring, and interpretable text generation. Experiments show WMVLM outperforms state-of-the-art VLMs with strong generalization across datasets, diffusion models, and watermarking methods.
翻译:数字水印对于保护扩散模型生成的图像安全至关重要。准确的水印评估对算法开发极为关键,但现有方法存在显著局限:缺乏同时适用于残差水印与语义水印的统一框架、评估结果缺乏可解释性、忽视全面的安全性考量,且常对语义水印使用不恰当的评估指标。为弥补这些不足,我们提出了WMVLM——首个基于视觉-语言模型(VLMs)的统一且可解释的扩散模型图像水印评估框架。我们为每类水印重新定义了质量与安全指标:残差水印通过伪影强度与抗擦除能力进行评估,而语义水印则通过潜在分布偏移进行度量。此外,我们引入三阶段训练策略,使模型逐步实现分类、评分和可解释的文本生成功能。实验表明,WMVLM在多个数据集、扩散模型及水印方法上均优于当前最先进的视觉-语言模型,并展现出强大的泛化能力。