Large Language Models (LLMs) have shown impressive proficiency in code generation. Unfortunately, these models share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. These vulnerabilities could allow unauthorized attackers to access sensitive data or systems, which is unacceptable for safety-critical applications. In this work, we propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code. Our approach leverages automatic static code analysis to empower the LLM to generate and implement potential solutions to address vulnerabilities. We address the research communitys needs for safe code generation by introducing a large-scale dataset, PythonSecurityEval, covering the diversity of real-world applications, including databases, websites and operating systems. We empirically validate that FDSP outperforms prior work that uses self-feedback from LLMs by up to 17.6% through our procedure that injects targeted, external feedback. Code and data are available at \url{https://github.com/Kamel773/LLM-code-refine}
翻译:大型语言模型(LLMs)在代码生成方面展现出令人印象深刻的能力。然而,这些模型与其人类开发者存在一个共同的弱点:生成的代码可能无意中包含安全漏洞。这些漏洞可能允许未经授权的攻击者访问敏感数据或系统,这对于安全关键型应用而言是不可接受的。在本研究中,我们提出反馈驱动的安全修补(FDSP)方法,使LLMs能够自动优化已生成的存在漏洞的代码。我们的方法利用自动静态代码分析来增强LLM生成并实施潜在解决方案以应对漏洞。为满足研究社区对安全代码生成的需求,我们引入了一个大规模数据集PythonSecurityEval,涵盖包括数据库、网站和操作系统在内的多样化实际应用场景。通过注入针对性外部反馈的流程,我们实证验证了FDSP优于先前仅利用LLM自我反馈的研究方法,性能提升高达17.6%。代码与数据集公开于\url{https://github.com/Kamel773/LLM-code-refine}