PRSA: Prompt Reverse Stealing Attacks against Large Language Models

Prompt, recognized as crucial intellectual property, enables large language models (LLMs) to perform specific tasks without the need of fine-tuning, underscoring their escalating importance. With the rise of prompt-based services, such as prompt marketplaces and LLM applications, providers often display prompts' capabilities through input-output examples to attract users. However, this paradigm raises a pivotal security concern: does the exposure of input-output pairs pose the risk of potential prompt leakage, infringing on the intellectual property rights of the developers? To our knowledge, this problem still has not been comprehensively explored yet. To remedy this gap, in this paper, we perform the first in depth exploration and propose a novel attack framework for reverse-stealing prompts against commercial LLMs, namely PRSA. The main idea of PRSA is that by analyzing the critical features of the input-output pairs, we mimic and gradually infer (steal) the target prompts. In detail, PRSA mainly consists of two key phases: prompt mutation and prompt pruning. In the mutation phase, we propose a prompt attention algorithm based on differential feedback to capture these critical features for effectively inferring the target prompts. In the prompt pruning phase, we identify and mask the words dependent on specific inputs, enabling the prompts to accommodate diverse inputs for generalization. Through extensive evaluation, we verify that PRSA poses a severe threat in real world scenarios. We have reported these findings to prompt service providers and actively collaborate with them to take protective measures for prompt copyright.

翻译：提示词作为关键知识产权，能使大语言模型无需微调即可执行特定任务，其重要性日益凸显。随着基于提示词的服务（如提示词市场和LLM应用）兴起，服务提供商常通过输入输出示例展示提示词功能以吸引用户，但这种范式引发了重大安全隐患：暴露输入输出对是否会导致提示词潜在泄露，从而侵犯开发者的知识产权？据我们所知，该问题尚未得到全面探索。为弥补这一空白，本文首次进行深入探究，提出针对商业大语言模型的新型逆向窃取提示词攻击框架——PRSA。PRSA的核心思想是通过分析输入输出对的关键特征，模拟并逐步推断（窃取）目标提示词。具体而言，PRSA包含两个关键阶段：提示词变异与提示词剪枝。在变异阶段，我们提出基于差分反馈的提示词注意力算法，以捕获关键特征从而有效推断目标提示词；在提示词剪枝阶段，我们识别并掩码依赖特定输入的词，使提示词能适应多样化输入以实现泛化。通过广泛评估，我们验证了PRSA在真实场景中构成严重威胁。目前已向提示词服务提供商报告相关发现，并积极合作采取保护措施维护提示词版权。