Privacy Issues in Large Language Models: A Survey

This is the first survey of the active area of AI research that focuses on privacy issues in Large Language Models (LLMs). Specifically, we focus on work that red-teams models to highlight privacy risks, attempts to build privacy into the training or inference process, enables efficient data deletion from trained models to comply with existing privacy regulations, and tries to mitigate copyright issues. Our focus is on summarizing technical research that develops algorithms, proves theorems, and runs empirical evaluations. While there is an extensive body of legal and policy work addressing these challenges from a different angle, that is not the focus of our survey. Nevertheless, these works, along with recent legal developments do inform how these technical problems are formalized, and so we discuss them briefly in Section 1. While we have made our best effort to include all the relevant work, due to the fast moving nature of this research we may have missed some recent work. If we have missed some of your work please contact us, as we will attempt to keep this survey relatively up to date. We are maintaining a repository with the list of papers covered in this survey and any relevant code that was publicly available at https://github.com/safr-ml-lab/survey-llm.

翻译：本文是对人工智能研究活跃领域中聚焦于大型语言模型（LLM）隐私问题的首次综述。具体而言，我们重点梳理以下研究方向：通过红队测试揭示模型隐私风险的工作、尝试在训练或推理过程中嵌入隐私保护机制的研究、实现已训练模型中高效数据删除以符合现有隐私法规的方法，以及试图缓解版权问题的技术。我们关注的核心是那些开发算法、证明定理并进行实证评估的技术性研究总结。虽然已有大量法律与政策类文献从不同角度探讨这些挑战，但这并非本综述的重点。然而，这些研究工作以及近期法律进展确实影响了技术问题的形式化表述，因此我们将在第1节简要讨论。尽管我们尽最大努力纳入所有相关研究，但由于该领域发展迅速，可能遗漏部分最新成果。若您的研究未被收录，请与我们联系，我们将持续更新本综述。我们维护着一个包含本文所涉论文列表及相关公开代码的仓库，地址为：https://github.com/safr-ml-lab/survey-llm。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日