Watermarking methods for language models have been studied extensively in the autoregressive setting, where tokens are generated sequentially. These works largely focus on local-context schemes that perturb the next token's distribution as a function of its preceding tokens. In diffusion language models, distributions over many unresolved positions are jointly sampled, allowing additive statistics of the entire sequence to be tractable during generation. We propose a watermark for masked diffusion language models that controls a global, vector-valued sketch representation of the text. Compared to context-dependent watermarking, the sketch formulation decouples detection from the local contexts seen during generation, resulting in an order-agnostic statistic and a watermarking rule which does not manifest as a simple token bias. We analyze the distortion, soundness, and robustness properties of the method.
翻译:针对自回归生成场景(即按序生成token)的语言模型水印方法已得到广泛研究。这类工作主要聚焦于局部上下文方案:通过扰动后续token的概率分布,使其依赖于前序token。而在扩散语言模型中,对多个未确定位置的分布进行联合采样,使得生成过程中整个序列的可加性统计量具有可处理性。我们提出一种适用于掩码扩散语言模型的水印方法,该方法通过控制文本的全局向量化草图表示实现水印嵌入。与上下文相关的水印相比,草图表示将检测过程与生成时使用的局部上下文解耦,从而生成与顺序无关的统计量以及不表现为简单token偏置的水印规则。本文分析了该方法的失真度、可靠性和鲁棒性特征。