Large Language Models (LLMs) have shown impressive proficiency in code generation. Nonetheless, similar to human developers, these models might generate code that contains security vulnerabilities and flaws. Writing secure code remains a substantial challenge, as vulnerabilities often arise during interactions between programs and external systems or services, such as databases and operating systems. In this paper, we propose a novel approach, Feedback-Driven Solution Synthesis (FDSS), designed to explore the use of LLMs in receiving feedback from Bandit, which is a static code analysis tool, and then the LLMs generate potential solutions to resolve security vulnerabilities. Each solution, along with the vulnerable code, is then sent back to the LLM for code refinement. Our approach shows a significant improvement over the baseline and outperforms existing approaches. Furthermore, we introduce a new dataset, PythonSecurityEval, collected from real-world scenarios on Stack Overflow to evaluate the LLMs' ability to generate secure code. Code and data are available at \url{https://github.com/Kamel773/LLM-code-refine}
翻译:大型语言模型(LLMs)在代码生成方面已展现出卓越能力。然而,与人类开发者类似,这些模型生成的代码可能包含安全漏洞与缺陷。编写安全代码仍是一项重大挑战,因为漏洞通常产生于程序与外部系统或服务(如数据库和操作系统)交互的过程中。本文提出一种新颖方法——反馈驱动解综合(FDSS),旨在探索LLMs如何接收静态代码分析工具Bandit的反馈,并由此生成潜在解决方案以修复安全漏洞。每个解决方案连同存在漏洞的代码被再次发送至LLM进行代码优化。本方法相较于基线模型取得显著提升,并优于现有方法。此外,我们引入了新数据集PythonSecurityEval,该数据集基于Stack Overflow上的真实场景构建,用于评估LLMs生成安全代码的能力。代码与数据见 \url{https://github.com/Kamel773/LLM-code-refine}