DAVE: A Policy-Enforcing LLM Spokesperson for Secure Multi-Document Data Sharing

In current inter-organizational data spaces, usage policies are enforced mainly at the asset level: a whole document or dataset is either shared or withheld. When only parts of a document are sensitive, providers who want to avoid leaking protected information typically must manually redact documents before sharing them, which is costly, coarse-grained, and hard to maintain as policies or partners change. We present DAVE, a usage policy-enforcing LLM spokesperson that answers questions over private documents on behalf of a data provider. Instead of releasing documents, the provider exposes a natural language interface whose responses are constrained by machine-readable usage policies. We formalize policy-violating information disclosure in this setting, drawing on usage control and information flow security, and introduce virtual redaction: suppressing sensitive information at query time without modifying source documents. We describe an architecture for integrating such a spokesperson with Eclipse Dataspace Components and ODRL-style policies, and outline an initial provider-side integration prototype in which QA requests are routed through a spokesperson service instead of triggering raw document transfer. Our contribution is primarily architectural: we do not yet implement or empirically evaluate the full enforcement pipeline. We therefore outline an evaluation methodology to assess security, utility, and performance trade-offs under benign and adversarial querying as a basis for future empirical work on systematically governed LLM access to multi-party data spaces.

翻译：在当前跨组织数据空间中，使用策略主要在资产层面执行：整个文档或数据集要么被共享，要么被保留。当文档中仅部分内容敏感时，为避免泄露受保护信息，提供者通常必须在共享前手动编辑文档，这种做法成本高昂、粒度粗糙，且难以随策略或合作伙伴的变化而维护。我们提出DAVE，一种使用策略执行型LLM发言人，代表数据提供者回答关于私有文档的问题。提供者不直接发布文档，而是暴露一个自然语言接口，其响应受机器可读使用策略的约束。我们借鉴使用控制和信息流安全理论，形式化了此场景下违反策略的信息披露，并引入虚拟编辑：在查询时抑制敏感信息，而无需修改源文档。我们描述了一种将此类发言人与Eclipse Dataspace Components和ODRL风格策略集成的架构，并概述了一个初步的提供者侧集成原型，其中问答请求通过发言人服务路由，而非触发原始文档传输。我们的贡献主要是架构性的：我们尚未实现或实证评估完整的执行流程。因此，我们概述了一种评估方法，以衡量在良性和对抗性查询下的安全性、效用和性能权衡，为未来关于多方数据空间中系统化管理的LLM访问的实证研究奠定基础。