In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the red-green list watermarking algorithm. We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms.
翻译:本文研究大语言模型(LLMs)的水印问题。我们考虑模型失真与检测能力之间的权衡,并将其形式化为基于红绿列表水印算法的约束优化问题。我们证明了该优化问题的最优解具有优良的解析性质,这有助于更好地理解水印过程并启发算法设计。基于这一优化公式,我们开发了一种在线对偶梯度上升水印算法,并证明了其在模型失真与检测能力之间具有渐近帕累托最优性。该结果明确保证了平均绿列表概率的提升,从而提升了检测能力(与先前结果形成对比)。此外,我们系统讨论了水印问题中模型失真度量的选择,论证了KL散度的合理性,并指出现有"无失真"准则与困惑度指标存在的问题。最后,我们在广泛数据集上针对基准算法进行了实证评估。