Large language models now produce text indistinguishable from human writing, which increases the need for reliable provenance tracing. Multi-bit watermarking can embed identifiers into generated text, but existing methods struggle to keep both text quality and watermark strength while carrying long messages. We propose MC$^2$Mark, a distortion-free multi-bit watermarking framework designed for reliable embedding and decoding of long messages. Our key technical idea is Multi-Channel Colored Reweighting, which encodes bits through structured token reweighting while keeping the token distribution unbiased, together with Multi-Layer Sequential Reweighting to strengthen the watermark signal and an evidence-accumulation detector for message recovery. Experiments show that MC$^2$Mark improves detectability and robustness over prior multi-bit watermarking methods while preserving generation quality, achieving near-perfect accuracy for short messages and exceeding the second-best method by nearly 30% for long messages.
翻译:大语言模型生成的文本已与人类写作难以区分,这增加了对可靠来源追溯的需求。多比特水印可将标识符嵌入生成文本中,但现有方法在承载长消息时难以同时保持文本质量与水印强度。我们提出MC$^2$Mark,一种专为长消息可靠嵌入与解码设计的无失真多比特水印框架。其核心技术思想是多通道着色重加权,该方法通过结构化令牌重加权编码比特信息,同时保持令牌分布无偏;结合多层序列重加权以增强水印信号,并采用证据累积检测器实现消息恢复。实验表明,MC$^2$Mark在保持生成质量的同时,相比现有多比特水印方法显著提升了可检测性与鲁棒性,在短消息上实现接近完美的准确率,在长消息上较次优方法提升近30%。