LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

In software development, the predominant emphasis on functionality often supersedes security concerns, a trend gaining momentum with AI-driven automation tools like GitHub Copilot. These tools significantly improve developers' efficiency in functional code development. Nevertheless, it remains a notable concern that such tools are also responsible for creating insecure code, predominantly because of pre-training on publicly available repositories with vulnerable code. Moreover, developers are called the "weakest link in the chain" since they have very minimal knowledge of code security. Although existing solutions provide a reasonable solution to vulnerable code, they must adequately describe and educate the developers on code security to ensure that the security issues are not repeated. Therefore we introduce a multipurpose code vulnerability analysis system \texttt{SecRepair}, powered by a large language model, CodeGen2 assisting the developer in identifying and generating fixed code along with a complete description of the vulnerability with a code comment. Our innovative methodology uses a reinforcement learning paradigm to generate code comments augmented by a semantic reward mechanism. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We further identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub. Our findings underscore that incorporating reinforcement learning coupled with semantic reward augments our model's performance, thereby fortifying its capacity to address code vulnerabilities with improved efficacy.

翻译：在软件开发中，对功能性的过度强调往往使安全问题被忽视，这一趋势在GitHub Copilot等AI驱动自动化工具的推动下愈发显著。这些工具虽能大幅提升开发者在功能性代码开发中的效率，但值得注意的是，由于在包含漏洞代码的公开代码库上进行预训练，它们也可能生成不安全的代码。此外，开发者因缺乏代码安全知识而被视为"安全链条中最薄弱的环节"。尽管现有解决方案为漏洞代码提供了合理应对措施，但它们未能充分描述和教育开发者关于代码安全的知识，导致安全问题重复出现。为此，我们提出一个多用途代码漏洞分析系统\texttt{SecRepair}——该系统基于大型语言模型CodeGen2构建，可协助开发者识别漏洞、生成修复代码，并通过代码注释完整描述漏洞信息。我们创新的方法论采用强化学习范式，通过语义奖励机制增强代码注释生成。借鉴人类修复代码问题的方式，我们提出了一个适用于大语言模型漏洞分析的指令式数据集。我们进一步在GitHub上6个开源物联网操作系统中识别出零日与N日漏洞。研究结果表明，将强化学习与语义奖励相结合能显著提升模型性能，从而更有效地增强其应对代码漏洞的能力。