Logits of API-Protected LLMs Leak Proprietary Information

Large language model (LLM) providers often hide the architectural details and parameters of their proprietary models by restricting public access to a limited API. In this work we show that, with only a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1000 USD for OpenAI's gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We exploit this fact to unlock several capabilities, including (but not limited to) obtaining cheap full-vocabulary outputs, auditing for specific types of model updates, identifying the source LLM given a single full LLM output, and even efficiently discovering the LLM's hidden size. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability.

翻译：大型语言模型（LLM）提供商通常通过将公开访问限制在有限的API上来隐藏其专有模型的架构细节和参数。在本研究中，我们表明，仅需对模型架构做出保守假设，即可通过相对较少的API查询（例如，对于OpenAI的gpt-3.5-turbo，成本低于1000美元）学习到关于受API保护的LLM的大量非公开信息。我们的发现基于一个关键观察：大多数现代LLM存在softmax瓶颈，这限制了模型输出仅能处于完整输出空间的线性子空间内。我们利用这一事实解锁了多种能力，包括（但不限于）获取廉价的全词汇表输出、审计特定类型的模型更新、根据单个完整LLM输出识别源模型，甚至高效发现LLM的隐藏维度。实证研究表明，我们的方法具有显著效果，例如我们估计OpenAI的gpt-3.5-turbo的嵌入维度约为4096。最后，我们讨论了LLM提供商防范此类攻击的可行方案，并指出这些能力也可被视为增强透明度和可问责性的有益特性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日