Retrieval-Augmented Generation (RAG) systems introduce a critical vulnerability: contextual leakage, where adversaries exploit instruction-following to exfiltrate Personally Identifiable Information (PII) via adaptive extraction. Current defenses force a rigid trade-off between semantic utility and latency. We present SEAL-Tag, a privacy-preserving runtime environment that resolves this via a Verify-then-Route paradigm. SEAL-Tag introduces the SEAL-Probe protocol, transforming auditing into a structured tool-use operation where the model generates a verifiable PII-Evidence Table (PET) alongside its draft. To adjudicate this evidence, we employ a Probabilistic Circuit (PC) that enforces verifiable logical constraints for robust decision-making. To overcome the privacy "Cold Start" problem, we introduce the S0--S6 Anchored Synthesis Pipeline, generating high-fidelity, provenanced RAG interactions. We pair this with a Two-Stage Curriculum that first optimizes for entity detection before aligning the model to the rigorous audit protocol. Our evaluation demonstrates that SEAL-Tag establishes a new Pareto frontier, reducing adaptive leakage by over 8$\times$ while matching the utility and speed of unsafe baselines.
翻译:检索增强生成(RAG)系统引入了一个关键漏洞:上下文泄露,即攻击者利用指令跟随功能,通过自适应提取手段泄露个人可识别信息(PII)。现有防御方案迫使系统在语义效用与延迟之间做出僵化的权衡。本文提出SEAL-Tag,一种隐私保护的运行时环境,通过“先验证后路由”范式解决此问题。SEAL-Tag引入了SEAL-Probe协议,将审计转化为一种结构化的工具使用操作,使模型在生成草稿的同时,生成一个可验证的PII证据表(PET)。为裁决此证据,我们采用概率电路(PC),通过可验证的逻辑约束实现鲁棒的决策。为克服隐私“冷启动”问题,我们提出了S0--S6锚定合成流水线,用于生成高保真、可溯源的RAG交互数据。我们将其与两阶段课程学习相结合:首先优化实体检测,随后将模型与严格的审计协议对齐。评估结果表明,SEAL-Tag建立了一个新的帕累托前沿,在保持与不安全基线相当的效用和速度的同时,将自适应泄露降低了超过8倍。