As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but existing methods either provide only binary signals or distort the sampling distribution, degrading text quality; distortion-free approaches, in turn, often suffer from weak detectability or robustness. We propose MirrorMark, a multi-bit and distortion-free watermark for LLMs. By mirroring sampling randomness in a measure-preserving manner, MirrorMark embeds multi-bit messages without altering the token probability distribution, preserving text quality by design. To improve robustness, we introduce a context-based scheduler that balances token assignments across message positions while remaining resilient to insertions and deletions. We further provide a theoretical analysis of the equal error rate to interpret empirical performance. Experiments show that MirrorMark matches the text quality of non-watermarked generation while achieving substantially stronger detectability: with 54 bits embedded in 300 tokens, it improves bit accuracy by 8-12% and correctly identifies up to 11% more watermarked texts at 1% false positive rate.
翻译:随着大语言模型(LLMs)在问答系统和内容生成等应用中的普及,可靠的内容溯源变得日益重要。水印技术是一种前景广阔的方法,但现有方案要么仅提供二元检测信号,要么会扭曲采样分布从而降低文本质量;而无失真方法则往往面临检测能力弱或鲁棒性不足的问题。本文提出MirrorMark,一种面向大语言模型的多比特无失真水印方案。通过以保测方式镜像采样随机性,MirrorMark能够在保持词元概率分布不变的条件下嵌入多比特信息,从而在原理上保障文本质量。为提升鲁棒性,我们设计了基于上下文的调度器,在平衡各消息位对应词元分配的同时保持对插入和删除操作的适应性。此外,我们通过理论分析等错误率来解释实证性能。实验表明,MirrorMark在保持与非水印生成相同文本质量的同时,实现了显著更强的检测能力:在300个词元中嵌入54比特信息时,其比特准确率提升8-12%,在1%误报率下能正确识别出额外11%的水印文本。