With the rapid progress of large language models (LLMs), many downstream NLP tasks can be well solved given good prompts. Though model developers and researchers work hard on dialog safety to avoid generating harmful content from LLMs, it is still challenging to steer AI-generated content (AIGC) for the human good. As powerful LLMs are devouring existing text data from various domains (e.g., GPT-3 is trained on 45TB texts), it is natural to doubt whether the private information is included in the training data and what privacy threats can these LLMs and their downstream applications bring. In this paper, we study the privacy threats from OpenAI's model APIs and New Bing enhanced by ChatGPT and show that application-integrated LLMs may cause more severe privacy threats ever than before. To this end, we conduct extensive experiments to support our claims and discuss LLMs' privacy implications.
翻译:随着大型语言模型(LLMs)的快速发展,许多下游自然语言处理任务在给定良好提示的情况下可以得到有效解决。尽管模型开发者和研究人员在对话安全性方面付出了巨大努力,以避免LLMs生成有害内容,但如何引导人工智能生成内容(AIGC)服务于人类福祉仍然充满挑战。由于强大的LLMs正在吞噬来自各个领域的现有文本数据(例如,GPT-3在45TB文本上训练),人们自然怀疑训练数据中是否包含隐私信息,以及这些LLMs及其下游应用可能带来何种隐私威胁。在本文中,我们研究了由ChatGPT增强的OpenAI模型API和New Bing所带来的隐私威胁,并表明应用集成的LLMs可能引发比以往更严重的隐私威胁。为此,我们进行了大量实验来支持我们的观点,并讨论了LLMs的隐私影响。