Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained through efficient retrieval mechanisms over large-scale data collections. Currently, the majority of existing approaches overlook the risks associated with exposing sensitive or access-controlled information directly to the generation model. Only a few approaches propose techniques to instruct the generative model to refrain from disclosing sensitive information; however, recent studies have also demonstrated that LLMs remain vulnerable to prompt injection attacks that can override intended behavioral constraints. For these reasons, we propose a novel approach to Selective Disclosure in Retrieval-Augmented Generation, called SD-RAG, which decouples the enforcement of security and privacy constraints from the generation process itself. Rather than relying on prompt-level safeguards, SD-RAG applies sanitization and disclosure controls during the retrieval phase, prior to augmenting the language model's input. Moreover, we introduce a semantic mechanism to allow the ingestion of human-readable dynamic security and privacy constraints together with an optimized graph-based data model that supports fine-grained, policy-aware retrieval. Our experimental evaluation demonstrates the superiority of SD-RAG over baseline existing approaches, achieving up to a $58\%$ improvement in the privacy score, while also showing a strong resilience to prompt injection attacks targeting the generative model.
翻译:检索增强生成(RAG)因其能够将大型语言模型(LLM)的生成能力与通过大规模数据集合的高效检索机制获取的知识相结合而受到广泛关注。目前,大多数现有方法忽视了将敏感或访问控制信息直接暴露给生成模型所带来的风险。仅有少数方法提出了指导生成模型避免披露敏感信息的技术;然而,近期研究也表明,LLM仍然容易受到提示注入攻击,此类攻击可能覆盖预期的行为约束。基于这些原因,我们提出了一种用于检索增强生成中实现选择性披露的新方法,称为SD-RAG。该方法将安全与隐私约束的执行与生成过程本身解耦。SD-RAG不依赖提示层面的安全措施,而是在检索阶段、增强语言模型输入之前,应用净化与披露控制。此外,我们引入了一种语义机制,允许摄入人类可读的动态安全与隐私约束,并配合一个支持细粒度、策略感知检索的优化图数据模型。我们的实验评估表明,SD-RAG优于现有的基线方法,在隐私分数上实现了高达$58\%$的提升,同时对针对生成模型的提示注入攻击表现出强大的抵御能力。