Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models

Yuchen Chen,Weisong Sun,Haocheng Huang,Yuan Xiao,Chunrong Fang,Yiran Zhang,Tingting Xu,Zhenpeng Chen,An Guo,Peizhuo Lv,Xiaofang Zhang,Zhenyu Chen,Yang Liu,Baowen Xu

from arxiv, Accepted to IEEE Transactions on Software Engineering (TSE)

Code Language Models (CodeLMs) have become integral to software engineering, significantly advancing code intelligence tasks. However, their widespread adoption has raised critical security concerns, particularly regarding susceptibility to backdoor attacks. Recent studies have uncovered naturally occurring backdoors, referred to as natural backdoors, in normally trained deep learning models. Despite posing threats as serious as those introduced through data poisoning, security implications of natural backdoor vulnerabilities in CodeLMs remain poorly understood. In this paper, we conduct a thorough empirical study of natural backdoor vulnerabilities in CodeLMs across various model architectures and code intelligence tasks. Specifically, we examine potential natural backdoor vulnerabilities across 44 scenarios, demonstrating that natural backdoors are prevalent and intrinsic to CodeLMs. We reveal differences between injected and natural backdoor vulnerabilities at both the model and parameter levels. We then analyze the transferability of natural backdoor vulnerabilities from three perspectives: datasets, model architectures, and shared knowledge. We further investigate the causes of natural backdoors from two aspects: training datasets and the model training procedure. We evaluate existing backdoor defense techniques, including pre-training, in-training, and post-training defenses, in mitigating natural backdoors. Finally, we propose ScanNBT, a novel detection method designed to improve comprehensive detection of natural backdoor vulnerabilities in CodeLMs. We aim for our findings to enhance understanding of these vulnerabilities and provide insights for strengthening CodeLM security against backdoor threats.

翻译：代码语言模型（CodeLMs）已成为软件工程的核心组成部分，显著推动了代码智能任务的进展。然而，其广泛应用引发了关键的安全担忧，特别是在对后门攻击的敏感性方面。近期研究揭示了在正常训练的深度学习模型中存在自然出现的后门，称为自然后门。尽管这些后门与通过数据投毒引入的后门同样构成严重威胁，但代码语言模型中自然后门漏洞的安全影响仍未被充分理解。本文针对多种模型架构和代码智能任务下的代码语言模型自然后门漏洞进行了全面的实证研究。具体而言，我们在44个场景中检验了潜在的自然后门漏洞，证明自然后门在代码语言模型中普遍存在且具有固有性。我们从模型和参数两个层面揭示了注入后门与自然后门漏洞之间的差异。随后，我们从数据集、模型架构和共享知识三个角度分析了自然后门漏洞的可迁移性。此外，我们从训练数据集和模型训练过程两个方面探究了自然后门的成因。我们评估了现有后门防御技术（包括预训练防御、训练中防御和后训练防御）在缓解自然后门方面的效果。最后，我们提出了ScanNBT，一种旨在提升代码语言模型中自然后门漏洞全面检测能力的新型检测方法。我们期望这些发现能够加深对这些漏洞的理解，并为强化代码语言模型抵御后门威胁的安全性提供见解。