In LLM/VLM agents, prompt privacy risk propagates beyond a single model call because raw user content can flow into retrieval queries, memory writes, tool calls, and logs. Existing de-identification pipelines address document boundaries but not this cross-stage propagation. We propose BodhiPromptShield, a policy-aware framework that detects sensitive spans, routes them via typed placeholders, semantic abstraction, or secure symbolic mapping, and delays restoration to authorized boundaries. Relative to enterprise redaction, this adds explicit propagation-aware mediation and restoration timing as a security variable. Under controlled evaluation on the Controlled Prompt-Privacy Benchmark (CPPB), stage-wise propagation suppresses from 10.7\% to 7.1\% across retrieval, memory, and tool stages; PER reaches 9.3\% with 0.94 AC and 0.92 TSR, outperforming generic de-identification. These are controlled systems results on CPPB rather than formal privacy guarantees or public-benchmark transfer claims. The project repository is available at https://github.com/mabo1215/BodhiPromptShield.git.
翻译:在LLM/VLM智能体中,提示隐私风险会超越单次模型调用而传播,原因在于原始用户内容可能流入检索查询、记忆写入、工具调用和日志记录。现有去标识化流程处理了文档边界,但未能解决这种跨阶段传播问题。我们提出菩提提示盾(BodhiPromptShield),一个策略感知框架,用于检测敏感范围,通过类型化占位符、语义抽象或安全符号映射进行路由,并将恢复延迟至授权边界。相对于企业级密文编辑,该框架引入了显式的传播感知中介机制,并将恢复时序作为安全变量。在受控提示隐私基准(CPPB)上的评估表明,跨检索、记忆和工具阶段的逐阶段传播率从10.7%降至7.1%;在PER达到9.3%的同时,AC为0.94,TSR为0.92,性能优于通用去标识化方案。这些结果为CPPB基准上的受控系统性能,不构成形式化隐私保证或公开基准迁移声明。项目仓库地址:https://github.com/mabo1215/BodhiPromptShield.git。