Individuals have been increasingly interacting with online Large Language Models (LLMs), both in their work and personal lives. These interactions raise privacy issues as the LLMs are typically hosted by third-parties who can gather a variety of sensitive information about users and their companies. Text Sanitization techniques have been proposed in the literature and can be used to sanitize user prompts before sending them to the LLM. However, sanitization has an impact on the downstream task performed by the LLM, and often to such an extent that it leads to unacceptable results for the user. This is not just a minor annoyance, with clear monetary consequences as LLM services charge on a per use basis as well as great amount of computing resources wasted. We propose an architecture leveraging a Small Language Model (SLM) at the user-side to help estimate the impact of sanitization on a prompt before it is sent to the LLM, thus preventing resource losses. Our evaluation of this architecture revealed a significant problem with text sanitization based on Differential Privacy, on which we want to draw the attention of the community for further investigation.
翻译:随着大型语言模型(LLM)在工作与个人生活中的广泛应用,用户与在线LLM的交互日益频繁。此类交互引发了隐私问题,因为LLM通常由第三方托管,可能收集用户及其所在公司的各类敏感信息。现有文献提出了文本脱敏技术,可在用户提示发送至LLM前进行隐私处理。然而,脱敏操作会影响LLM下游任务的表现,其影响程度常导致用户无法接受的结果。这不仅是轻微困扰,更会带来明确的经济损失——LLM服务按使用量计费,且会浪费大量计算资源。本文提出一种架构,利用用户端的小型语言模型(SLM)在提示发送至LLM前预估脱敏操作的影响,从而避免资源损耗。通过对该架构的评估,我们发现基于差分隐私的文本脱敏方法存在显著问题,特此提请学界关注以推动进一步研究。