Watermarking is a technical means to dissuade malfeasant usage of Large Language Models. This paper proposes a novel watermarking scheme, so-called WaterMax, that enjoys high detectability while sustaining the quality of the generated text of the original LLM. Its new design leaves the LLM untouched (no modification of the weights, logits, temperature, or sampling technique). WaterMax balances robustness and complexity contrary to the watermarking techniques of the literature inherently provoking a trade-off between quality and robustness. Its performance is both theoretically proven and experimentally validated. It outperforms all the SotA techniques under the most complete benchmark suite.
翻译:水印是一种用于劝阻大规模语言模型恶意使用的技术手段。本文提出一种名为WaterMax的新型水印方案,该方案在保持原始大语言模型生成文本质量的同时,实现了高可检测性。其全新设计无需改动大语言模型(不修改权重、逻辑值、温度或采样技术)。与文献中固有地引发质量与鲁棒性之间权衡的水印技术不同,WaterMax平衡了鲁棒性与复杂性。其性能既得到了理论证明,也经过实验验证。在最完整的基准测试套件下,它超越了所有当前最先进的技术。