LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

In software development, the predominant emphasis on functionality often supersedes security concerns, a trend gaining momentum with AI-driven automation tools like GitHub Copilot. These tools significantly improve developers' efficiency in functional code development. Nevertheless, it remains a notable concern that such tools are also responsible for creating insecure code, predominantly because of pre-training on publicly available repositories with vulnerable code. Moreover, developers are called the "weakest link in the chain" since they have very minimal knowledge of code security. Although existing solutions provide a reasonable solution to vulnerable code, they must adequately describe and educate the developers on code security to ensure that the security issues are not repeated. Therefore we introduce a multipurpose code vulnerability analysis system \texttt{SecRepair}, powered by a large language model, CodeGen2 assisting the developer in identifying and generating fixed code along with a complete description of the vulnerability with a code comment. Our innovative methodology uses a reinforcement learning paradigm to generate code comments augmented by a semantic reward mechanism. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We further identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub. Our findings underscore that incorporating reinforcement learning coupled with semantic reward augments our model's performance, thereby fortifying its capacity to address code vulnerabilities with improved efficacy.

翻译：在软件开发过程中，对功能性的过度强调往往使安全性问题被忽视——这一趋势在GitHub Copilot等AI驱动自动化工具的推动下愈发明显。虽然这些工具显著提升了开发者在功能性代码开发方面的效率，但值得关注的是，由于这些工具在包含漏洞代码的公开代码库上进行预训练，它们同样会生成不安全的代码。此外，开发者因缺乏代码安全知识而被视作"链条中最薄弱的环节"。尽管现有解决方案对漏洞代码提供了合理处理，但在描述和指导开发者掌握代码安全知识方面仍存在不足，无法从根本上杜绝安全问题的重复出现。为此，我们提出基于大语言模型CodeGen2的多功能代码漏洞分析系统\texttt{SecRepair}，该系统能协助开发者识别漏洞并生成修复代码，同时通过代码注释提供完整的漏洞描述。我们创新的方法论采用强化学习范式，通过语义奖励机制增强代码注释的生成能力。受人类修复代码问题的启发，我们构建了适用于大语言模型漏洞分析的指令式数据集，并在GitHub平台上对6个开源物联网操作系统的零日漏洞和已知漏洞进行识别。研究结果表明，将强化学习与语义奖励机制相结合能显著提升模型性能，从而更有效地增强其处理代码漏洞的能力。