Cloud-based large language models (LLMs) such as ChatGPT have increasingly become integral to daily operations, serving as vital tools across various applications. While these models offer substantial benefits in terms of accessibility and functionality, they also introduce significant privacy concerns: the transmission and storage of user data in cloud infrastructures pose substantial risks of data breaches and unauthorized access to sensitive information; even if the transmission and storage of data is encrypted, the LLM service provider itself still knows the real contents of the data, preventing individuals or entities from confidently using such LLM services. To address these concerns, this paper proposes a simple yet effective mechanism EmojiCrypt to protect user privacy. It uses Emoji to encrypt the user inputs before sending them to LLM, effectively rendering them indecipherable to human or LLM's examination while retaining the original intent of the prompt, thus ensuring the model's performance remains unaffected. We conduct experiments on three tasks, personalized recommendation, sentiment analysis, and tabular data analysis. Experiment results reveal that EmojiCrypt can encrypt personal information within prompts in such a manner that not only prevents the discernment of sensitive data by humans or LLM itself, but also maintains or even improves the precision without further tuning, achieving comparable or even better task accuracy than directly prompting the LLM without prompt encryption. These results highlight the practicality of adopting encryption measures that safeguard user privacy without compromising the functional integrity and performance of LLMs. Code and dataset are available at https://github.com/agiresearch/EmojiCrypt.
翻译:基于云的大语言模型(如ChatGPT)已日益成为日常运营的核心组成部分,在各类应用中发挥着关键作用。尽管这些模型在易用性和功能性方面具有显著优势,但也引发了重大隐私问题:用户数据在云端基础设施中的传输与存储过程面临数据泄露风险,敏感信息可能遭未授权访问;即便数据传输与存储已加密,大语言模型服务提供商仍能获悉数据的真实内容,导致个人或实体无法放心使用此类服务。针对上述问题,本文提出一种简洁高效的机制EmojiCrypt,用于保护用户隐私。该机制在将用户输入发送至大语言模型前,采用Emoji进行加密,使得人类或模型本身均无法解读其内容,同时保留原始提示的意图,确保模型性能不受影响。我们在个性化推荐、情感分析和表格数据分析三项任务上开展实验。结果表明,EmojiCrypt不仅能够加密提示中的个人信息,防止人类或大语言模型识别敏感数据,还能在无需额外调优的情况下维持甚至提升任务精度,获得与未加密提示直接推理相当乃至更优的任务准确率。这些结果彰显了在保障模型功能完整性与性能的同时,采用加密措施保护用户隐私的可行性。代码与数据集详见https://github.com/agiresearch/EmojiCrypt。