Recent years have witnessed a proliferation of valuable original natural language contents found in subscription-based media outlets, web novel platforms, and outputs of large language models. However, these contents are susceptible to illegal piracy and potential misuse without proper security measures. This calls for a secure watermarking system to guarantee copyright protection through leakage tracing or ownership identification. To effectively combat piracy and protect copyrights, a multi-bit watermarking framework should be able to embed adequate bits of information and extract the watermarks in a robust manner despite possible corruption. In this work, we explore ways to advance both payload and robustness by following a well-known proposition from image watermarking and identify features in natural language that are invariant to minor corruption. Through a systematic analysis of the possible sources of errors, we further propose a corruption-resistant infill model. Our full method improves upon the previous work on robustness by +16.8% point on average on four datasets, three corruption types, and two corruption ratios. Code available at https://github.com/bangawayoo/nlp-watermarking.
翻译:近年来,订阅制媒体平台、网络小说平台以及大语言模型的输出中,涌现出大量有价值的原创自然语言内容。然而,若缺乏适当的安全防护措施,这些内容极易遭受非法盗版与潜在滥用。因此,亟需构建一个安全的水印系统,通过泄露溯源或所有权识别来保障版权保护。为有效打击盗版并保护版权,多比特水印框架应能够嵌入足够比特的信息,并在面临可能篡改时鲁棒地提取水印。本研究遵循图像水印领域的著名主张,探索提升载荷与鲁棒性的方法,并识别自然语言中对轻微篡改具有不变性的特征。通过对错误来源的系统性分析,我们进一步提出一种抗篡改的填充模型。我们的完整方法在四个数据集、三种篡改类型及两种篡改比率上,相较于先前工作在鲁棒性方面平均提升16.8个百分点。相关代码已开源至https://github.com/bangawayoo/nlp-watermarking。