The Machine Learning as a Service (MLaaS) market is rapidly expanding and becoming more mature. For example, OpenAI's ChatGPT is an advanced large language model (LLM) that generates responses for various queries with associated fees. Although these models can deliver satisfactory performance, they are far from perfect. Researchers have long studied the vulnerabilities and limitations of LLMs, such as adversarial attacks and model toxicity. Inevitably, commercial ML models are also not exempt from such issues, which can be problematic as MLaaS continues to grow. In this paper, we discover a new attack strategy against LLM APIs, namely the prompt abstraction attack. Specifically, we propose Mondrian, a simple and straightforward method that abstracts sentences, which can lower the cost of using LLM APIs. In this approach, the adversary first creates a pseudo API (with a lower established price) to serve as the proxy of the target API (with a higher established price). Next, the pseudo API leverages Mondrian to modify the user query, obtain the abstracted response from the target API, and forward it back to the end user. Our results show that Mondrian successfully reduces user queries' token length ranging from 13% to 23% across various tasks, including text classification, generation, and question answering. Meanwhile, these abstracted queries do not significantly affect the utility of task-specific and general language models like ChatGPT. Mondrian also reduces instruction prompts' token length by at least 11% without compromising output quality. As a result, the prompt abstraction attack enables the adversary to profit without bearing the cost of API development and deployment.
翻译:机器学习即服务(MLaaS)市场正迅速扩张且日益成熟。例如,OpenAI的ChatGPT作为一种先进的大语言模型(LLM),能够针对各类查询生成响应并收取相应费用。尽管这些模型可提供令人满意的性能,但它们远非完美。研究人员长期致力于研究LLM的脆弱性与局限性,例如对抗攻击和模型毒性。不可避免地,商业机器学习模型也无法免除此类问题,这随着MLaaS的发展可能引发隐患。本文发现了一种针对LLM API的新型攻击策略,即提示抽象攻击。具体而言,我们提出了一种名为Mondrian的简洁直白方法,通过对句子进行抽象化处理来降低使用LLM API的成本。在该方法中,攻击者首先创建一个伪API(具有较低设定价格),作为目标API(具有较高设定价格)的代理。随后,伪API利用Mondrian修改用户查询,从目标API获取抽象化响应,并将其转发回最终用户。实验结果表明,在文本分类、生成和问答等各类任务中,Mondrian成功将用户查询的令牌长度缩减13%至23%。同时,这些抽象化查询并未显著影响ChatGPT等任务特定模型及通用语言模型的效用。Mondrian还能在不损害输出质量的前提下,将指令提示的令牌长度至少缩减11%。因此,提示抽象攻击使攻击者无需承担API开发与部署成本即可获利。