After large models (LMs) have gained widespread acceptance in code-related tasks, their superior generative capacity has greatly promoted the application of the code LM. Nevertheless, the security of the generated code has raised attention to its potential damage. Existing secure code generation methods have limited generalizability to unseen test cases and poor robustness against the attacked model, leading to safety failures in code generation. In this paper, we propose a generalizable and robust secure code generation method SecCoder by using in-context learning (ICL) and the safe demonstration. The dense retriever is also used to select the most helpful demonstration to maximize the improvement of the generated code's security. Experimental results show the superior generalizability of the proposed model SecCoder compared to the current secure code generation method, achieving a significant security improvement of an average of 7.20% on unseen test cases. The results also show the better robustness of SecCoder compared to the current attacked code LM, achieving a significant security improvement of an average of 7.74%. Our analysis indicates that SecCoder enhances the security of LMs in generating code, and it is more generalizable and robust.
翻译:随着大模型在代码相关任务中获得广泛认可,其卓越的生成能力极大地推动了代码大模型的应用。然而,生成代码的安全性已引起对其潜在危害的关注。现有的安全代码生成方法对未见测试案例的泛化能力有限,且对受攻击模型的鲁棒性较差,导致代码生成中出现安全失效。本文提出一种可泛化且鲁棒的代码安全生成方法SecCoder,该方法利用上下文学习与安全演示。我们还采用密集检索器来选取最有帮助的演示,以最大化提升生成代码的安全性。实验结果表明,相较于当前的安全代码生成方法,所提出的SecCoder模型具有优越的泛化能力,在未见测试案例上平均实现7.20%的显著安全性提升。结果同时显示,与当前受攻击的代码大模型相比,SecCoder具备更好的鲁棒性,平均实现7.74%的显著安全性提升。我们的分析表明,SecCoder增强了大模型生成代码的安全性,且具有更强的泛化能力与鲁棒性。