LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property. However, in untrusted deployments, adversaries can copy and reuse these prompts with other proprietary LLMs, causing economic losses. To protect these prompts, we identify four key challenges: proactivity, runtime protection, usability, and non-portability that existing approaches fail to address. We present PragLocker, a prompt protection scheme that satisfies these requirements. PragLocker constructs function-preserving obfuscated prompts by anchoring semantics with code symbols and then using target-model feedback to inject noise, yielding prompts that only work on the target LLM. Experiments across multiple agent systems, datasets, and foundation LLMs show that PragLocker substantially reduces cross-LLM portability, maintains target performance, and remains robust against adaptive attackers.
翻译:大型语言模型(LLM)智能体依赖提示词在基础LLM上实现任务特定能力,这使得智能体提示词成为宝贵的知识产权。然而,在非可信部署中,攻击者能够复制并复用这些提示词至其他专有LLM,造成经济损失。为保护这些提示词,我们识别出现有方法未能解决的四个关键挑战:主动性、运行时保护、可用性及不可移植性。本文提出PragLocker——一种满足上述需求的提示词保护方案。PragLocker通过代码符号锚定语义以构建保持功能性的混淆提示词,继而利用目标模型反馈注入噪声,生成仅能在目标LLM上生效的提示词。在多个智能体系统、数据集及基础LLM上的实验表明,PragLocker显著降低了跨LLM可移植性,同时维持目标性能,并对自适应攻击者保持鲁棒性。