Prompt serves as a crucial link in interacting with large language models (LLMs), widely impacting the accuracy and interpretability of model outputs. However, acquiring accurate and high-quality responses necessitates precise prompts, which inevitably pose significant risks of personal identifiable information (PII) leakage. Therefore, this paper proposes DePrompt, a desensitization protection and effectiveness evaluation framework for prompt, enabling users to safely and transparently utilize LLMs. Specifically, by leveraging large model fine-tuning techniques as the underlying privacy protection method, we integrate contextual attributes to define privacy types, achieving high-precision PII entity identification. Additionally, through the analysis of key features in prompt desensitization scenarios, we devise adversarial generative desensitization methods that retain important semantic content while disrupting the link between identifiers and privacy attributes. Furthermore, we present utility evaluation metrics for prompt to better gauge and balance privacy and usability. Our framework is adaptable to prompts and can be extended to text usability-dependent scenarios. Through comparison with benchmarks and other model methods, experimental evaluations demonstrate that our desensitized prompt exhibit superior privacy protection utility and model inference results.
翻译:提示是与大型语言模型交互的关键环节,广泛影响着模型输出的准确性与可解释性。然而,获取精确且高质量的响应需要精准的提示,这不可避免地带来个人可识别信息泄露的重大风险。为此,本文提出DePrompt——一种针对提示的脱敏保护与效用评估框架,使用户能够安全、透明地利用大型语言模型。具体而言,我们以大规模模型微调技术作为底层隐私保护方法,通过整合上下文属性来定义隐私类型,实现高精度的PII实体识别。此外,通过分析提示脱敏场景中的关键特征,我们设计了对抗性生成脱敏方法,在保留重要语义内容的同时,破坏标识符与隐私属性之间的关联。进一步地,我们提出了针对提示的效用评估指标,以更好地衡量并平衡隐私性与可用性。本框架可适配于各类提示,并能扩展至依赖文本可用性的应用场景。通过与基准方法及其他模型方法的对比实验评估表明,经我们脱敏处理的提示在隐私保护效用与模型推理结果方面均表现出优越性。