With the increasing use of conversational AI systems, there is growing concern over privacy leaks, especially when users share sensitive personal data in interactions with Large Language Models (LLMs). Conversations shared with these models may contain Personally Identifiable Information (PII), which, if exposed, could lead to security breaches or identity theft. To address this challenge, we present the Local Optimizations for Pseudonymization with Semantic Integrity Directed Entity Detection (LOPSIDED) framework, a semantically-aware privacy agent designed to safeguard sensitive PII data when using remote LLMs. Unlike prior work that often degrade response quality, our approach dynamically replaces sensitive PII entities in user prompts with semantically consistent pseudonyms, preserving the contextual integrity of conversations. Once the model generates its response, the pseudonyms are automatically depseudonymized, ensuring the user receives an accurate, privacy-preserving output. We evaluate our approach using real-world conversations sourced from ShareGPT, which we further augment and annotate to assess whether named entities are contextually relevant to the model's response. Our results show that LOPSIDED reduces semantic utility errors by a factor of 5 compared to baseline techniques, all while enhancing privacy.
翻译:随着对话式AI系统的日益普及,隐私泄露问题愈发受到关注,尤其是在用户与大型语言模型(LLMs)交互过程中分享敏感个人数据时。与这些模型共享的对话可能包含个人可识别信息(PII),一旦泄露可能导致安全漏洞或身份盗用。为应对这一挑战,我们提出了基于语义完整性导向实体检测的本地优化假名化框架(LOPSIDED),这是一种语义感知的隐私保护代理,旨在使用远程LLMs时保护敏感PII数据。与以往常损害响应质量的研究不同,我们的方法动态地将用户提示中的敏感PII实体替换为语义一致的假名,从而保持对话的上下文完整性。模型生成响应后,假名会自动还原,确保用户获得准确且保护隐私的输出。我们使用来自ShareGPT的真实对话数据进行评估,并进一步通过数据增强和标注来判定命名实体是否与模型响应存在上下文相关性。实验结果表明,与基线技术相比,LOPSIDED将语义效用误差降低了5倍,同时显著增强了隐私保护能力。