Privacy Issues in Large Language Models: A Survey

This is the first survey of the active area of AI research that focuses on privacy issues in Large Language Models (LLMs). Specifically, we focus on work that red-teams models to highlight privacy risks, attempts to build privacy into the training or inference process, enables efficient data deletion from trained models to comply with existing privacy regulations, and tries to mitigate copyright issues. Our focus is on summarizing technical research that develops algorithms, proves theorems, and runs empirical evaluations. While there is an extensive body of legal and policy work addressing these challenges from a different angle, that is not the focus of our survey. Nevertheless, these works, along with recent legal developments do inform how these technical problems are formalized, and so we discuss them briefly in Section 1. While we have made our best effort to include all the relevant work, due to the fast moving nature of this research we may have missed some recent work. If we have missed some of your work please contact us, as we will attempt to keep this survey relatively up to date. We are maintaining a repository with the list of papers covered in this survey and any relevant code that was publicly available at https://github.com/safr-ml-lab/survey-llm.

翻译：本文是对人工智能研究活跃领域——大型语言模型（LLMs）中隐私问题的首次综述。具体而言，我们聚焦于以下方向的工作：利用红队测试模型以揭示隐私风险、尝试在训练或推理过程中构建隐私保护机制、从已训练模型中实现高效数据删除以遵守现有隐私法规，以及试图缓解版权问题。我们的重点在于总结那些开发算法、证明定理并开展实证评估的技术研究。尽管已有大量从不同角度处理这些挑战的法律与政策工作，但这并非本综述的重点。然而，这些工作以及近期的法律发展确实影响了这些技术问题的形式化方式，因此我们将在第1节简要讨论。尽管我们已尽最大努力涵盖所有相关研究，但由于该领域进展迅速，可能遗漏了一些近期工作。若您的研究被忽略，请联系我们，我们将尽力保持本综述的相对时效性。我们维护了一个包含本综述所涵盖论文列表及公开可用代码的仓库，地址为：https://github.com/safr-ml-lab/survey-llm。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日