In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the green-red algorithm of Kirchenbauer et al. (2023a). We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms.
翻译:本文研究大语言模型(LLM)水印技术中的模型失真与检测能力之间的权衡关系。我们将该问题建模为基于Kirchenbauer等人(2023a)提出的绿红算法的约束优化问题,并证明该优化问题的最优解具有优美的解析性质,这为水印过程提供了更深入的理论理解,同时启发了相关算法设计。基于该优化框架,我们提出了一种在线对偶梯度上升水印算法,并证明了该算法在模型失真与检测能力之间的渐近帕累托最优性。该结果明确保证平均绿名单概率的提升,进而确保了检测能力(与以往结果形成对比)。此外,我们系统讨论了水印问题中模型失真度量指标的选择,论证了使用KL散度的合理性,并揭示了现有"无失真"准则及困惑度指标的缺陷。最后,我们在多个数据集上对提出的算法与基准方法进行了实证评估。