The increasing reliance on large language models (LLMs) such as ChatGPT in various fields emphasizes the importance of ``prompt engineering,'' a technology to improve the quality of model outputs. With companies investing significantly in expert prompt engineers and educational resources rising to meet market demand, designing high-quality prompts has become an intriguing challenge. In this paper, we propose a novel attack against LLMs, named prompt stealing attacks. Our proposed prompt stealing attack aims to steal these well-designed prompts based on the generated answers. The prompt stealing attack contains two primary modules: the parameter extractor and the prompt reconstruction. The goal of the parameter extractor is to figure out the properties of the original prompts. We first observe that most prompts fall into one of three categories: direct prompt, role-based prompt, and in-context prompt. Our parameter extractor first tries to distinguish the type of prompts based on the generated answers. Then, it can further predict which role or how many contexts are used based on the types of prompts. Following the parameter extractor, the prompt reconstructor can be used to reconstruct the original prompts based on the generated answers and the extracted features. The final goal of the prompt reconstructor is to generate the reversed prompts, which are similar to the original prompts. Our experimental results show the remarkable performance of our proposed attacks. Our proposed attacks add a new dimension to the study of prompt engineering and call for more attention to the security issues on LLMs.
翻译:随着ChatGPT等大型语言模型(LLMs)在各领域的广泛应用,“提示工程”——一种提升模型输出质量的技术——变得日益重要。企业投入巨资聘请专业提示工程师,同时教育资源也迅速增长以满足市场需求,设计高质量的提示已成为一项引人入胜的挑战。本文提出一种针对LLMs的新型攻击方法,命名为提示窃取攻击。该攻击旨在根据模型生成的答案窃取这些精心设计的提示。提示窃取攻击包含两个核心模块:参数提取器与提示重构器。参数提取器的目标是推断原始提示的属性。我们首先观察到大多数提示可分为三类:直接提示、角色扮演提示和情境提示。参数提取器首先根据生成的答案区分提示类型,随后可依据提示类型进一步预测具体扮演的角色或使用的上下文数量。在参数提取器之后,提示重构器基于生成的答案及提取的特征还原原始提示。提示重构器的最终目标是生成与原始提示相似的逆向提示。实验结果表明,我们提出的攻击方法具有显著效果。本研究为提示工程领域开辟了新维度,并呼吁关注LLMs的安全问题。