The integration of large language models (LLMs) into a wide range of applications has highlighted the critical role of well-crafted system prompts, which require extensive testing and domain expertise. These prompts enhance task performance but may also encode sensitive information and filtering criteria, posing security risks if exposed. Recent research shows that system prompts are vulnerable to extraction attacks, while existing defenses are either easily bypassed or require constant updates to address new threats. In this work, we introduce ProxyPrompt, a novel defense mechanism that prevents prompt leakage by replacing the original prompt with a proxy. This proxy maintains the original task's utility while obfuscating the extracted prompt, ensuring attackers cannot reproduce the task or access sensitive information. Comprehensive evaluations on 264 LLM and system prompt pairs show that ProxyPrompt protects 94.70% of prompts from extraction attacks, outperforming the next-best defense, which only achieves 42.80%.
翻译:大语言模型(LLM)在各类应用中的广泛集成凸显了精心设计系统提示的关键作用。这类提示需要经过大量测试与领域专业知识打磨,虽能提升任务性能,但可能包含敏感信息与过滤规则,一旦泄露将引发安全风险。最新研究表明,系统提示存在被提取攻击利用的脆弱性,而现有防御措施要么容易被绕过,要么需要持续更新以应对新型威胁。本研究提出ProxyPrompt,一种通过代理提示替代原始提示来防止泄露的新型防御机制。该代理提示在保持原始任务实用性的同时混淆提取结果,确保攻击者无法复现任务或获取敏感信息。基于264组LLM与系统提示对的综合评估显示,ProxyPrompt可保护94.70%的提示免受提取攻击,显著优于仅能达到42.80%防护率的次优方案。