Large language models (LLMs) are complex artificial intelligence systems capable of understanding, generating and translating human language. They learn language patterns by analyzing large amounts of text data, allowing them to perform writing, conversation, summarizing and other language tasks. When LLMs process and generate large amounts of data, there is a risk of leaking sensitive information, which may threaten data privacy. This paper concentrates on elucidating the data privacy concerns associated with LLMs to foster a comprehensive understanding. Specifically, a thorough investigation is undertaken to delineate the spectrum of data privacy threats, encompassing both passive privacy leakage and active privacy attacks within LLMs. Subsequently, we conduct an assessment of the privacy protection mechanisms employed by LLMs at various stages, followed by a detailed examination of their efficacy and constraints. Finally, the discourse extends to delineate the challenges encountered and outline prospective directions for advancement in the realm of LLM privacy protection.
翻译:大型语言模型(LLM)是复杂的人工智能系统,能够理解、生成和翻译人类语言。它们通过分析海量文本数据学习语言模式,从而执行写作、对话、摘要等语言任务。当LLM处理并生成大量数据时,存在泄露敏感信息的风险,可能威胁数据隐私。本文旨在阐释LLM相关的数据隐私问题,以促进全面理解。具体而言,我们通过深入研究系统性地描绘了LLM中的数据隐私威胁图谱,涵盖了被动隐私泄露与主动隐私攻击。随后,我们对LLM在不同阶段采用的隐私保护机制进行了评估,并详细探讨了其有效性与局限性。最后,进一步阐明了LLM隐私保护领域面临的挑战,并展望了潜在的发展方向。