Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment. Existing secure code alignment methods often suffer from a functionality--security paradox, improving security at the cost of substantial utility degradation. We propose SecCoderX, an online reinforcement learning framework for functionality-preserving secure code generation. SecCoderX first bridges vulnerability detection and secure code generation by repurposing mature detection resources in two ways: (i) synthesizing diverse, reality-grounded vulnerability-inducing coding tasks for online RL rollouts, and (ii) training a reasoning-based vulnerability reward model that provides scalable and reliable security supervision. Together, these components are unified in an online RL loop to align code LLMs to generate secure and functional code. Extensive experiments demonstrate that SecCoderX achieves state-of-the-art performance, improving Effective Safety Rate (ESR) by approximately 10% over unaligned models, whereas prior methods often degrade ESR by 14-54%. We release our code, dataset and model checkpoints at https://github.com/AndrewWTY/SecCoderX.
翻译:大型语言模型(LLM)在软件开发中的应用日益广泛,但其生成不安全代码的倾向仍是实际部署的主要障碍。现有的安全代码对齐方法常受困于功能性与安全性之间的悖论,即以显著的功能性降级为代价来提升安全性。本文提出 SecCoderX,一种用于保持功能性的安全代码生成的在线强化学习框架。SecCoderX 首先通过两种方式整合成熟的漏洞检测资源,以桥接漏洞检测与安全代码生成:(i)为在线强化学习轨迹合成多样化、基于现实场景的漏洞诱导编码任务;(ii)训练一个基于推理的漏洞奖励模型,以提供可扩展且可靠的安全监督。这些组件在在线强化学习循环中协同工作,使代码 LLM 对齐以生成安全且功能完整的代码。大量实验表明,SecCoderX 实现了最先进的性能,将有效安全率(ESR)相较于未对齐模型提升了约 10%,而先前方法通常会使 ESR 下降 14-54%。我们在 https://github.com/AndrewWTY/SecCoderX 发布了代码、数据集和模型检查点。