Embedding models that generate representation vectors from natural language text are widely used, reflect substantial investments, and carry significant commercial value. Companies such as OpenAI and Cohere have developed competing embedding models accessed through APIs that require users to pay for usage. In this architecture, the models are "hidden" behind APIs, but this does not mean that they are "well guarded". We present, to our knowledge, the first effort to "steal" these models for retrieval by training local models on text-embedding pairs obtained from the commercial APIs. Our experiments show using standard benchmarks that it is possible to efficiently replicate the retrieval effectiveness of the commercial embedding models using an attack that costs only around $200 to train (presumably) smaller models with fewer dimensions. Our findings raise important considerations for deploying commercial embedding models and suggest measures to mitigate the risk of model theft.
翻译:从自然语言文本生成表示向量的嵌入模型被广泛应用,体现了大量投资并具有重要的商业价值。OpenAI和Cohere等公司开发了通过API访问的竞争性嵌入模型,用户需付费使用。在此架构中,模型被"隐藏"在API之后,但这并不意味着它们得到了"良好保护"。据我们所知,我们首次提出了通过从商业API获取文本-嵌入对来训练本地模型,从而"窃取"这些模型以用于检索的方法。我们的实验使用标准基准测试表明,通过仅需约200美元成本的攻击训练(推测为)维度更少的较小模型,即可有效复现商业嵌入模型的检索效能。我们的研究结果对部署商业嵌入模型提出了重要考量,并建议采取相应措施以降低模型被盗风险。