A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. These algorithms determine how to generate text from the internal probability distribution generated by the LM. The process of choosing a decoding algorithm and tuning its hyperparameters takes significant time, manual effort, and computation, and it also requires extensive human evaluation. Therefore, the identity and hyperparameters of such decoding algorithms are considered to be extremely valuable to their owners. In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyperparameters of its decoding algorithms at very low monetary costs. Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo. We demonstrate the feasibility of stealing such information with only a few dollars, e.g., $\$0.8$, $\$1$, $\$4$, and $\$40$ for the four versions of GPT-3.
翻译:现代语言模型(LM)生成文本的一个关键组成部分是解码算法的选择和调优。这些算法决定了如何根据LM生成的内部概率分布来生成文本。选择解码算法并调整其超参数需要耗费大量的时间、人工劳动和计算资源,同时还需要进行广泛的人工评估。因此,此类解码算法的身份和超参数对其所有者来说被认为极具价值。在本工作中,我们首次证明,拥有对LM常规API访问权限的对手能够以极低的金钱成本窃取其解码算法的类型和超参数。我们的攻击对用于文本生成API的流行LM(包括GPT-2、GPT-3和GPT-Neo)均有效。我们展示了仅需几美元即可窃取此类信息的可行性,例如,针对GPT-3的四个版本,成本分别为0.8美元、1美元、4美元和40美元。