With the rapid progress of large language models (LLMs), many downstream NLP tasks can be well solved given appropriate prompts. Though model developers and researchers work hard on dialog safety to avoid generating harmful content from LLMs, it is still challenging to steer AI-generated content (AIGC) for the human good. As powerful LLMs are devouring existing text data from various domains (e.g., GPT-3 is trained on 45TB texts), it is natural to doubt whether the private information is included in the training data and what privacy threats can these LLMs and their downstream applications bring. In this paper, we study the privacy threats from OpenAI's ChatGPT and the New Bing enhanced by ChatGPT and show that application-integrated LLMs may cause new privacy threats. To this end, we conduct extensive experiments to support our claims and discuss LLMs' privacy implications.
翻译:随着大型语言模型(LLMs)的快速发展,通过恰当的提示词,许多下游自然语言处理任务已能有效解决。尽管模型开发者和研究人员致力于对话安全研究,以避免LLMs生成有害内容,但引导人工智能生成内容(AIGC)服务于人类福祉仍具有挑战性。由于强大的LLMs正在吞噬来自不同领域的海量文本数据(例如,GPT-3基于45TB文本进行训练),人们自然会质疑训练数据中是否包含隐私信息,以及这些LLMs及其下游应用会带来何种隐私威胁。本文研究了OpenAI的ChatGPT及其增强版New Bing的隐私威胁,并表明应用程序集成的LLMs可能引发新的隐私风险。为此,我们通过大量实验验证上述论断,并探讨了LLMs的隐私影响。