Large language models (LLMs) are complex artificial intelligence systems capable of understanding, generating and translating human language. They learn language patterns by analyzing large amounts of text data, allowing them to perform writing, conversation, summarizing and other language tasks. When LLMs process and generate large amounts of data, there is a risk of leaking sensitive information, which may threaten data privacy. This paper concentrates on elucidating the data privacy concerns associated with LLMs to foster a comprehensive understanding. Specifically, a thorough investigation is undertaken to delineate the spectrum of data privacy threats, encompassing both passive privacy leakage and active privacy attacks within LLMs. Subsequently, we conduct an assessment of the privacy protection mechanisms employed by LLMs at various stages, followed by a detailed examination of their efficacy and constraints. Finally, the discourse extends to delineate the challenges encountered and outline prospective directions for advancement in the realm of LLM privacy protection.
翻译:大型语言模型(LLM)是复杂的人工智能系统,能够理解、生成和翻译人类语言。它们通过分析大量文本数据来学习语言模式,从而执行写作、对话、摘要等语言任务。当LLM处理并生成大量数据时,存在泄露敏感信息的风险,这可能威胁数据隐私。本文聚焦于阐明与LLM相关的数据隐私问题,以促进全面理解。具体而言,我们开展了一项深入调查,以界定数据隐私威胁的范畴,涵盖LLM中的被动隐私泄露和主动隐私攻击。随后,我们对LLM在不同阶段采用的隐私保护机制进行了评估,并详细审视了其有效性与局限性。最后,讨论延伸至描述LLM隐私保护领域面临的挑战,并概述了未来的潜在发展方向。