Code generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce the generation of insecure code, yet effective defenses remain limited. Existing scanning approaches rely on token-level generation consistency to invert attack targets, which is ineffective for source code where identical semantics can appear in diverse syntactic forms. We present CodeScan, which, to the best of our knowledge, is the first poisoning-scanning framework tailored to code generation models. CodeScan identifies attack targets by analyzing structural similarities across multiple generations conditioned on different clean prompts. It combines iterative divergence analysis with abstract syntax tree (AST)-based normalization to abstract away surface-level variation and unify semantically equivalent code, isolating structures that recur consistently across generations. CodeScan then applies LLM-based vulnerability analysis to determine whether the extracted structures contain security vulnerabilities and flags the model as compromised when such a structure is found. We evaluate CodeScan against four representative attacks under both backdoor and poisoning settings across three real-world vulnerability classes. Experiments on 108 models spanning three architectures and multiple model sizes demonstrate 97%+ detection accuracy with substantially lower false positives than prior methods.
翻译:代码生成大语言模型正日益融入现代软件开发工作流。近期研究表明,这些模型容易遭受后门和投毒攻击,导致生成不安全的代码,然而有效的防御措施仍然有限。现有扫描方法依赖令牌级生成一致性来反推攻击目标,这对于源代码而言效果不佳,因为相同的语义可能以多种句法形式呈现。我们提出了CodeScan,据我们所知,这是首个专为代码生成模型设计的投毒扫描框架。CodeScan通过分析基于不同干净提示生成的多个代码之间的结构相似性来识别攻击目标。该方法将迭代差异分析与基于抽象语法树的归一化技术相结合,以抽象掉表层差异并统一语义等价的代码,从而分离出在多次生成中持续复现的结构。随后,CodeScan应用基于大语言模型的漏洞分析来判断提取的结构是否包含安全漏洞,并在发现此类结构时将模型标记为已遭篡改。我们在后门和投毒两种设置下,针对三个真实漏洞类别,对CodeScan进行了四种代表性攻击的评估。在涵盖三种架构、多种模型规模的108个模型上的实验表明,该方法实现了97%以上的检测准确率,且误报率显著低于现有方法。