Although Large Language Models (LLMs) show promising solutions to automated code generation, they often produce insecure code that threatens software security. Current approaches (e.g., SafeCoder) to improve secure code generation suffer from limited and imbalanced datasets, reducing their effectiveness and generalizability. In this work, we present Secure-Instruct, a novel framework that automatically synthesizes high-quality vulnerable and secure code examples, generates fine-tuning instructions, and instruction-tunes LLMs to align task description and secure code generation abilities. We evaluate Secure-Instruct on four representative LLMs using two benchmarks: our own CWEBench and the existing CWEval. CWEBench comprises 93 scenarios on 44 CWEs, all without overlap with Secure-Instruct's synthetic instruction-tuning dataset, while CWEval covers 31 CWEs with 119 manually verified security-critical tasks. We find that Secure-Instruct improves not only the security but also the functional correctness of the generated code. On CWEBench, Secure-Instruct substantially improves secure code generation, giving a 14.3% average increase in secure ratio over the pretrained models and outperforms SafeCoder by 7.6%. On CWEval, Secure-Instruct achieves a 14% increase for CodeLlama-7B and 5.8% for Mistral-7B in Func-Sec@1 over pretrained models, and surpasses SafeCoder by 15.8% and 6.8% respectively.
翻译:尽管大型语言模型(LLMs)为自动化代码生成提供了有前景的解决方案,但其生成的代码常存在安全隐患,威胁软件安全。现有提升安全代码生成的方法(如SafeCoder)受限于数据规模有限且分布不均衡,降低了其有效性和泛化能力。本文提出Secure-Instruct,一种新颖的框架,能够自动合成高质量漏洞代码与安全代码示例,生成微调指令,并通过指令微调使LLMs的任务描述能力与安全代码生成能力对齐。我们在四种代表性LLMs上使用两个基准评估Secure-Instruct:自建的CWEBench和现有的CWEval。CWEBench包含44种CWE的93个场景,其数据均与Secure-Instruct的合成指令微调数据集无重叠;CWEval则涵盖31种CWE的119项人工验证的安全关键任务。实验表明,Secure-Instruct不仅提升了生成代码的安全性,也改善了其功能正确性。在CWEBench上,Secure-Instruct显著提升了安全代码生成能力,相较于预训练模型平均安全比率提升14.3%,并优于SafeCoder 7.6%。在CWEval上,Secure-Instruct使CodeLlama-7B和Mistral-7B的Func-Sec@1指标分别较预训练模型提升14%和5.8%,并分别以15.8%和6.8%的优势超越SafeCoder。