As LLMs become increasingly woven into everyday workflows, user queries sent to cloud hosted LLMs routinely mix task-essential content with task non-essential sensitive disclosures, yet type based PII redaction is context agnostic and may raise two issues: over disclosing untyped sensitive context and over removing answer bearing spans. We recast privacy preserving query rewriting under Contextual Integrity: a span should be forwarded only if it is necessary for the task. We introduce DelegateCI-Bench, the first task based Contextual Integrity benchmark for privacy-conscious delegation, comprising 3,167 samples that combine high quality synthetic data spanning 11 tasks and 20 task types, WildChat based real user queries, and a medical challenge set with dense sensitive information. Building on this benchmark, we propose a CI-guided reinforcement learning framework that converts essential and non-essential sensitive spans into verifiable optimization signals, and train a query rewriter to preserve task critical information while suppressing unnecessary sensitive disclosure. Experiments show that our learned rewriter achieves the best privacy-utility tradeoff, achieving up to +10.1 average utility over on-device baselines.
翻译:随着大语言模型日益融入日常工作流程,用户向云端托管LLM发送的查询往往混杂着任务必需的敏感信息与任务无关的敏感披露。基于类型的PII脱敏存在上下文无关性缺陷,可能导致两类问题:过度披露未分类的敏感上下文,以及过度移除承载答案的文本片段。我们提出在情境完整性理论框架下进行隐私保护查询重写:仅当文本片段对任务必要时才予以转发。为此,我们构建了首个面向隐私感知型委托的任务级情境完整性基准数据集DelegeCI-Bench,包含3167个样本,涵盖11类任务和20种任务类型的高质量合成数据、基于WildChat的真实用户查询,以及具有密集敏感信息的医学挑战集。基于该基准,我们提出情境完整性引导的强化学习框架,将必要与非必要敏感文本片段转化为可验证的优化信号,训练查询重写器在保留任务关键信息的同时抑制不必要的敏感信息披露。实验表明,我们训练的重写器实现了最佳隐私效用权衡,相较于端侧基线,平均效用提升达+10.1。