Recently, watermarking schemes for large language models (LLMs) have been proposed to distinguish text generated by machines and by humans. The present paper explores philosophical, political, and ethical ramifications of implementing and using watermarking schemes. A definition of authorship that includes both machines (LLMs) and humans is proposed to serve as a backdrop. It is argued that private watermarks may provide private companies with sweeping rights to determine authorship, which is incompatible with traditional standards of authorship determination. Then, possible ramifications of the so-called entropy dependence of watermarking mechanisms are explored. It is argued that entropy may vary for different, socially salient groups. This could lead to group dependent rates at which machine generated text is detected. Specifically, groups more interested in low entropy text may face the challenge that it is harder to detect machine generated text that is of interest to them.
翻译:近年来,针对大型语言模型(LLMs)的水印方案被提出,旨在区分机器生成文本与人类生成文本。本文探讨了实施与使用水印方案所涉及的哲学、政治及伦理影响。我们提出一种涵盖机器(LLMs)与人类的作者身份定义,以此作为讨论背景。研究表明,私有水印可能赋予私营公司广泛的作者身份判定权,这与传统的作者身份判定标准相冲突。进而,本文探讨了水印机制中所谓熵依赖性的潜在影响。论证指出,不同社会显著群体的熵可能存在差异,这可能导致机器生成文本的检测率因群体而异。具体而言,对低熵文本兴趣更高的群体可能面临挑战:其感兴趣的机器生成文本更难被检测。