Code Large Language Models (LLMs) have demonstrated remarkable capabilities in generating, understanding, and manipulating programming code. However, their training process inadvertently leads to the memorization of sensitive information, posing severe privacy risks. Existing studies on memorization in LLMs primarily rely on prompt engineering techniques, which suffer from limitations such as widespread hallucination and inefficient extraction of the target sensitive information. In this paper, we present a novel approach to characterize real and fake secrets generated by Code LLMs based on token probabilities. We identify four key characteristics that differentiate genuine secrets from hallucinated ones, providing insights into distinguishing real and fake secrets. To overcome the limitations of existing works, we propose DESEC, a two-stage method that leverages token-level features derived from the identified characteristics to guide the token decoding process. DESEC consists of constructing an offline token scoring model using a proxy Code LLM and employing the scoring model to guide the decoding process by reassigning token likelihoods. Through extensive experiments on four state-of-the-art Code LLMs using a diverse dataset, we demonstrate the superior performance of DESEC in achieving a higher plausible rate and extracting more real secrets compared to existing baselines. Our findings highlight the effectiveness of our token-level approach in enabling an extensive assessment of the privacy leakage risks associated with Code LLMs.
翻译:代码大语言模型在生成、理解和操作编程代码方面展现出卓越能力。然而,其训练过程会无意中导致敏感信息的记忆,带来严重的隐私风险。现有关于大语言模型记忆的研究主要依赖提示工程技术,这些方法存在普遍幻觉和低效提取目标敏感信息等局限性。本文提出一种基于令牌概率来表征代码大语言模型生成的真伪秘密的新方法。我们识别出区分真实秘密与幻觉秘密的四个关键特征,为辨别真伪秘密提供了洞见。为克服现有工作的局限,我们提出DESEC——一种两阶段方法,该方法利用从识别特征中推导出的令牌级特征来指导令牌解码过程。DESEC包括使用代理代码大语言模型构建离线令牌评分模型,以及通过重新分配令牌似然来运用该评分模型指导解码过程。通过在多样化数据集上对四个最先进的代码大语言模型进行广泛实验,我们证明DESEC相较于现有基线方法,在实现更高合理率和提取更多真实秘密方面具有优越性能。我们的研究结果凸显了令牌级方法在全面评估代码大语言模型相关隐私泄露风险方面的有效性。