Watermarking is a principled approach for tracing the provenance of large language model (LLM) outputs, but its deployment in practice is hindered by inference inefficiency. Speculative sampling accelerates inference, with efficiency improving as the acceptance rate between draft and target models increases. Yet recent work reveals a fundamental trade-off: higher watermark strength reduces acceptance, preventing their simultaneous achievement. We revisit this trade-off and show it is not absolute. We introduce a quantitative measure of watermark strength that governs statistical detectability and is maximized when tokens are deterministic functions of pseudorandom numbers. Using this measure, we fully characterize the trade-off as a constrained optimization problem and derive explicit Pareto curves for two existing watermarking schemes. Finally, we introduce a principled mechanism that injects pseudorandomness into draft-token acceptance, ensuring maximal watermark strength while maintaining speculative sampling efficiency. Experiments further show that this approach improves detectability without sacrificing efficiency. Our findings uncover a principle that unites speculative sampling and watermarking, paving the way for their efficient and practical deployment.
翻译:水印技术是追溯大型语言模型(LLM)输出来源的一种原则性方法,但其在实际部署中因推理效率低下而受阻。推测采样通过提高草稿模型与目标模型之间的接受率来加速推理。然而,近期研究揭示了一个根本性权衡:更高的水印强度会降低接受率,阻碍二者同时实现。我们重新审视这一权衡关系,并证明其并非绝对。我们引入了一种量化水印强度的度量方法,该方法主导统计可检测性,并在词元成为伪随机数的确定性函数时达到最大化。利用这一度量,我们将该权衡关系完整刻画为一个约束优化问题,并为两种现有水印方案推导出显式帕累托曲线。最后,我们提出一种原则性机制,将伪随机性注入草稿词元接受过程,在保持推测采样效率的同时确保水印强度最大化。实验进一步表明,该方法能在不牺牲效率的前提下提升可检测性。我们的发现揭示了统一推测采样与水印技术的原理,为二者高效实用的部署铺平了道路。