We introduce the first watermark tailored for diffusion language models (DLMs), an emergent LLM paradigm able to generate tokens in arbitrary order, in contrast to standard autoregressive language models (ARLMs) which generate tokens sequentially. While there has been much work in ARLM watermarking, a key challenge when attempting to apply these schemes directly to the DLM setting is that they rely on previously generated tokens, which are not always available with DLM generation. In this work we address this challenge by: (i) applying the watermark in expectation over the context even when some context tokens are yet to be determined, and (ii) promoting tokens which increase the watermark strength when used as context for other tokens. This is accomplished while keeping the watermark detector unchanged. Our experimental evaluation demonstrates that the DLM watermark leads to a >99% true positive rate with minimal quality impact and achieves similar robustness to existing ARLM watermarks, enabling for the first time reliable DLM watermarking.
翻译:我们提出了首个专为扩散语言模型设计的水印方案。扩散语言模型是一种新兴的大语言模型范式,能够以任意顺序生成词元,这与标准自回归语言模型按顺序生成词元的模式形成鲜明对比。尽管自回归语言模型水印已有大量研究成果,但直接将这些方案应用于扩散语言模型场景面临的核心挑战在于:它们依赖于已生成的词元作为上下文,而扩散语言模型的生成过程中这些上下文词元并非始终可用。本研究通过以下方式应对该挑战:(i) 即使部分上下文词元尚未确定,仍基于上下文期望值实施水印嵌入;(ii) 当某些词元作为其他词元的上下文时,提升能增强水印强度的词元生成概率。这些改进在保持水印检测器不变的前提下实现。实验评估表明,该扩散语言模型水印方案在保证最小质量影响的前提下实现了>99%的真阳性率,其鲁棒性与现有自回归语言模型水印相当,首次实现了可靠的扩散语言模型水印技术。