Retrieval-augmented generation (RAG) systems often serialize user queries, retrieved documents, metadata, system labels, and task instructions into one natural-language prompt. We study a source-authority boundary failure in this design: attacker-authored retrieved text can impersonate metadata, provenance, authority, or disclosure-policy signals that appear control-relevant to the model. We call this pattern Document-Authored Control-Signal Impersonation (DACSI). DACSI is a non-imperative, metadata-like payload subclass within indirect prompt injection. Its central lesson is simple: document-authored labels are data, not policy. Command-style injection asks the model to ignore, override, or violate policy; DACSI asks whether untrusted document text can be misattributed as an authorized control signal when RAG prompt rendering collapses trusted and untrusted text into the same natural-language channel. We evaluate DACSI across six model settings, prompt-pressure levels, injection baselines, signal taxonomies, RAG-mediated pipelines, system-control probes, a source-authority attribution probe, and synthetic canary formats. We interpret the evidence by model regime rather than as six equal replications: DeepSeek V4 Pro and Qwen3.5-397B provide the cleanest positive lift, DeepSeek V4 Flash is a high-susceptibility setting, GPT-5.5 and Gemini 3.1 Pro Low are strong-boundary probes with selected residual risks, and GLM-4.7 is a saturated leakage boundary case. Across these regimes, DACSI warrants separate evaluation because it uses a command-free metadata/provenance/policy surface, follows a RAG-specific source-authority path, and responds to source/channel separation. The source-authority probe is behavioral attribution evidence, not proof of an internal mechanism.
翻译:检索增强生成(RAG)系统通常将用户查询、检索文档、元数据、系统标签和任务指令序列化为一条自然语言提示。本研究探讨了该设计中的源权威边界失效问题:攻击者撰写的检索文本可冒充元数据、出处、权威或披露策略信号,这些信号对模型而言具有控制相关性。我们将此类模式称为"文档撰写的控制信号冒充"(DACSI)。DACSI属于间接提示注入中一类非命令式、类元数据载荷子类型。其核心启示很简单:文档撰写的标签是数据而非策略。命令式注入要求模型忽略、覆盖或违反策略;而DACSI关注的是:当RAG提示渲染将可信与不可信文本合并至同一自然语言通道时,不可信文档文本能否被误归因于授权控制信号。我们在六种模型设置、提示压力等级、注入基线、信号分类法、RAG中介管道、系统控制探针、源权威归因探针及合成金丝雀格式下评估了DACSI。我们按模型机制而非六次等重复实验解释证据:DeepSeek V4 Pro和Qwen3.5-397B展现出最清晰的正面提升,DeepSeek V4 Flash为高敏感性设置,GPT-5.5和Gemini 3.1 Pro Low是具有选择性残余风险的强边界探针,GLM-4.7则为饱和泄漏边界案例。在这些机制中,DACSI需要独立评估,因其利用无命令的元数据/出处/策略表面,遵循RAG特定源权威路径,并对源/通道分离产生响应。源权威探针属于行为归因证据,而非内部机制的证明。