Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries

Retrieval-augmented generation (RAG) systems often serialize user queries, retrieved documents, metadata, system labels, and task instructions into one natural-language prompt. We study a source-authority boundary failure in this design: attacker-authored retrieved text can impersonate metadata, provenance, authority, or disclosure-policy signals that appear control-relevant to the model. We call this pattern Document-Authored Control-Signal Impersonation (DACSI). DACSI is a non-imperative, metadata-like payload subclass within indirect prompt injection. Its central lesson is simple: document-authored labels are data, not policy. Command-style injection asks the model to ignore, override, or violate policy; DACSI asks whether untrusted document text can be misattributed as an authorized control signal when RAG prompt rendering collapses trusted and untrusted text into the same natural-language channel. We evaluate DACSI across six model settings, prompt-pressure levels, injection baselines, signal taxonomies, RAG-mediated pipelines, system-control probes, a source-authority attribution probe, and synthetic canary formats. We interpret the evidence by model regime rather than as six equal replications: DeepSeek V4 Pro and Qwen3.5-397B provide the cleanest positive lift, DeepSeek V4 Flash is a high-susceptibility setting, GPT-5.5 and Gemini 3.1 Pro Low are strong-boundary probes with selected residual risks, and GLM-4.7 is a saturated leakage boundary case. Across these regimes, DACSI warrants separate evaluation because it uses a command-free metadata/provenance/policy surface, follows a RAG-specific source-authority path, and responds to source/channel separation. The source-authority probe is behavioral attribution evidence, not proof of an internal mechanism.

翻译：检索增强生成（RAG）系统通常将用户查询、检索文档、元数据、系统标签和任务指令序列化为一条自然语言提示。本研究探讨了该设计中的源权威边界失效问题：攻击者撰写的检索文本可冒充元数据、出处、权威或披露策略信号，这些信号对模型而言具有控制相关性。我们将此类模式称为"文档撰写的控制信号冒充"（DACSI）。DACSI属于间接提示注入中一类非命令式、类元数据载荷子类型。其核心启示很简单：文档撰写的标签是数据而非策略。命令式注入要求模型忽略、覆盖或违反策略；而DACSI关注的是：当RAG提示渲染将可信与不可信文本合并至同一自然语言通道时，不可信文档文本能否被误归因于授权控制信号。我们在六种模型设置、提示压力等级、注入基线、信号分类法、RAG中介管道、系统控制探针、源权威归因探针及合成金丝雀格式下评估了DACSI。我们按模型机制而非六次等重复实验解释证据：DeepSeek V4 Pro和Qwen3.5-397B展现出最清晰的正面提升，DeepSeek V4 Flash为高敏感性设置，GPT-5.5和Gemini 3.1 Pro Low是具有选择性残余风险的强边界探针，GLM-4.7则为饱和泄漏边界案例。在这些机制中，DACSI需要独立评估，因其利用无命令的元数据/出处/策略表面，遵循RAG特定源权威路径，并对源/通道分离产生响应。源权威探针属于行为归因证据，而非内部机制的证明。

相关内容

元数据

关注 7

元数据（Metadata），又称元数据、中介数据、中继数据[来源请求]，为描述数据的数据（data about data），主要是描述数据属性（property）的信息，用来支持如指示存储位置、历史数据、资源查找、文件纪录等功能。元数据算是一种电子式目录，为了达到编制目录的目的，必须在描述并收藏数据的内容或特色，进而达成协助数据检索的目的。

【AAAI2026】TruthfulRAG：基于知识图谱解决检索增强生成中的事实层冲突

专知会员服务

22+阅读 · 2025年11月15日

检索增强生成（RAG）技术，261页slides

专知会员服务

42+阅读 · 2025年10月16日

【新书】Essential GraphRAG: 知识图谱增强的RAG

专知会员服务

35+阅读 · 2025年7月17日

图增强生成（GraphRAG）

专知会员服务

35+阅读 · 2025年1月4日