Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning

Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic forget set containing diverse forms of personal information, and (ii) a retain set designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than 50% while retaining about over 91% of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from direct to indirect leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.

翻译：代码大语言模型（LLMs4Code）在代码生成任务中已展现出强大性能，但近期研究表明，它们可能记忆并泄露训练数据中包含的敏感信息，从而带来严重的隐私风险。为填补这一研究空白，本文首次对应用机器遗忘技术缓解LLMs4Code中敏感信息泄露问题进行了全面的实证研究。我们首先构建了一个专用基准测试集，包含：（i）涵盖多种个人信息形式的合成遗忘数据集，以及（ii）用于评估遗忘后代码生成能力是否得以保持的保留数据集。基于该基准，我们系统评估了三种代表性遗忘算法（GA、GA+GD、GA+KL）在三个广泛使用的开源LLMs4Code模型（AIXCoder-7B、CodeLlama-7B、CodeQwen-7B）上的表现。实验结果表明，机器遗忘能显著降低基于直接记忆的泄露：平均而言，直接泄露率下降超过50%，同时保留了约91%以上的原始代码生成性能。此外，通过分析遗忘后的输出，我们发现泄露模式存在从直接泄露向间接泄露的一致性转变，揭示了即使目标数据已被成功遗忘，仍持续存在的未充分探索的安全漏洞。我们的研究证明，机器遗忘是增强LLMs4Code隐私保护的可行且有效的解决方案，同时也指出未来需要开发能够同时缓解直接与间接泄露的新型技术。