Large language model (LLM) providers often hide the architectural details and parameters of their proprietary models by restricting public access to a limited API. In this work we show that, with only a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1000 USD for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We exploit this fact to unlock several capabilities, including (but not limited to) obtaining cheap full-vocabulary outputs, auditing for specific types of model updates, identifying the source LLM given a single full LLM output, and even efficiently discovering the LLM's hidden size. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability.
翻译:大型语言模型(LLM)提供商通常通过将公开访问限制在有限的API上来隐藏其专有模型的架构细节和参数。在本研究中,我们表明,仅需对模型架构做出保守假设,即可通过相对较少的API查询(例如,对于OpenAI的gpt-3.5-turbo,成本低于1000美元)学习到关于受API保护的LLM的大量非公开信息。我们的发现基于一个关键观察:大多数现代LLM存在softmax瓶颈,这限制了模型输出仅能处于完整输出空间的线性子空间内。我们利用这一事实解锁了多种能力,包括(但不限于)获取廉价的全词汇表输出、审计特定类型的模型更新、根据单个完整LLM输出识别源模型,甚至高效发现LLM的隐藏维度。实证研究表明,我们的方法具有显著效果,例如我们估计OpenAI的gpt-3.5-turbo的嵌入维度约为4096。最后,我们讨论了LLM提供商防范此类攻击的可行方案,并指出这些能力也可被视为增强透明度和可问责性的有益特性。