Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation

from arxiv, The 3rd International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S), co-located with SANER 2025

This paper explores the risk that a large language model (LLM) trained for code generation on data mined from software repositories will generate content that discloses sensitive information included in its training data. We decompose this risk, known in the literature as ``unintended memorization,'' into two components: unintentional disclosure (where an LLM presents secrets to users without the user seeking them out) and malicious disclosure (where an LLM presents secrets to an attacker equipped with partial knowledge of the training data). We observe that while existing work mostly anticipates malicious disclosure, unintentional disclosure is also a concern. We describe methods to assess unintentional and malicious disclosure risks side-by-side across different releases of training datasets and models. We demonstrate these methods through an independent assessment of the Open Language Model (OLMo) family of models and its Dolma training datasets. Our results show, first, that changes in data source and processing are associated with substantial changes in unintended memorization risk; second, that the same set of operational changes may increase one risk while mitigating another; and, third, that the risk of disclosing sensitive information varies not only by prompt strategies or test datasets but also by the types of sensitive information. These contributions rely on data mining to enable greater privacy and security testing required for the LLM training data supply chain.

翻译：本文探讨了为代码生成而训练的大型语言模型（LLM）在基于软件仓库挖掘数据训练时，可能生成包含其训练数据中敏感信息内容的风险。我们将文献中称为“无意记忆”的这一风险分解为两个组成部分：无意泄露（LLM在用户未主动寻求的情况下向其呈现秘密信息）和恶意泄露（LLM向掌握部分训练数据知识的攻击者呈现秘密信息）。我们观察到，现有研究主要关注恶意泄露，但无意泄露同样值得重视。我们提出了并行评估不同训练数据集和模型版本中无意与恶意泄露风险的方法，并通过独立评估Open Language Model（OLMo）系列模型及其Dolma训练数据集展示了这些方法的应用。研究结果表明：首先，数据来源和处理方式的变化与无意记忆风险的显著变化相关；其次，同一组操作变更可能增加一种风险同时缓解另一种风险；第三，敏感信息泄露风险不仅因提示策略或测试数据集而异，还因敏感信息类型不同而变化。这些研究成果依托数据挖掘技术，为LLM训练数据供应链所需的隐私与安全测试提供了更强大的支持。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日