Recent years have witnessed a proliferation of valuable original natural language contents found in subscription-based media outlets, web novel platforms, and outputs of large language models. Without proper security measures, however, these contents are susceptible to illegal piracy and potential misuse. This calls for a secure watermarking system to guarantee copyright protection through leakage tracing or ownership identification. To effectively combat piracy and protect copyrights, a watermarking framework should be able not only to embed adequate bits of information but also extract the watermarks in a robust manner despite possible corruption. In this work, we explore ways to advance both payload and robustness by following a well-known proposition from image watermarking and identify features in natural language that are invariant to minor corruption. Through a systematic analysis of the possible sources of errors, we further propose a corruption-resistant infill model. Our full method improves upon the previous work on robustness by +16.8% point on average on four datasets, three corruption types, and two corruption ratios. Code available at https://github.com/bangawayoo/nlp-watermarking.
翻译:近年来,订阅制媒体平台、网络小说平台以及大型语言模型的输出中涌现出大量高价值的原创自然语言内容。然而,在缺乏适当安全措施的情况下,这些内容极易遭受非法盗版与潜在滥用。这要求建立一种安全的水印系统,通过泄露溯源或所有权识别来实现版权保护。为有效打击盗版并保护版权,水印框架不仅需要嵌入足够的信息量,还必须能够在可能的内容损坏情况下稳健提取水印。本研究借鉴图像水印领域的经典理论,探索提升水印容量与鲁棒性的新方法,并识别自然语言中可抵抗轻微损坏的不变特征。通过对潜在错误来源的系统性分析,我们进一步提出一种抗损坏填充模型。完整方法在四个数据集、三种损坏类型及两种损坏比率下,平均鲁棒性较前人工作提升16.8个百分点。代码见https://github.com/bangawayoo/nlp-watermarking。