The rapid proliferation of large language models (LLMs) has created an urgent need for reliable methods to detect whether a text is generated by such models. In this paper, we propose SimMark, a posthoc watermarking algorithm that makes LLMs' outputs traceable without requiring access to the model's internal logits, enabling compatibility with a wide range of LLMs, including API-only models. By leveraging the similarity of semantic sentence embeddings and rejection sampling to impose detectable statistical patterns imperceptible to humans, and employing a soft counting mechanism, SimMark achieves robustness against paraphrasing attacks. Experimental results demonstrate that SimMark sets a new benchmark for robust watermarking of LLM-generated content, surpassing prior sentence-level watermarking techniques in robustness, sampling efficiency, and applicability across diverse domains, all while preserving the text quality.
翻译:大型语言模型(LLMs)的快速扩散催生了对可靠方法以检测文本是否由此类模型生成的迫切需求。本文提出SimMark,一种后处理水印算法,它使LLMs的输出可追溯,且无需访问模型的内部logits,从而能够兼容包括仅API模型在内的广泛LLMs。通过利用语义句子嵌入的相似性和拒绝采样来施加人类难以察觉的可检测统计模式,并采用软计数机制,SimMark实现了对改写攻击的鲁棒性。实验结果表明,SimMark为LLM生成内容的鲁棒水印设立了新基准,在鲁棒性、采样效率以及跨不同领域的适用性方面均超越了先前的句子级水印技术,同时保持了文本质量。