SecCodePRM: A Process Reward Model for Code Security

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level supervision. Both families often require complete context, provide sparse end-of-completion feedback, and can degrade as code length grows, making them ill-suited for real-time, prefix-level assessment during interactive coding and streaming generation. We propose SecCodePRM, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory. To train the model, we derive step-level supervision labels from static analyzers and expert annotations, allowing the model to attend more precisely to fine-grained regions associated with inter-procedural vulnerabilities. SecCodePRM has three applications: full-code vulnerability detection (VD), partial-code VD, and secure code generation (CG). For VD, SecCodePRM uses risk-sensitive aggregation that emphasizes high-risk steps; for CG, SecCodePRM supports inference-time scaling by ranking candidate continuations and favoring higher cumulative reward. This design yields dense, real-time feedback that scales to long-horizon generation. Empirically, SecCodePRM outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.

翻译：大语言模型正迅速成为现代软件开发流程的核心组成部分，然而确保代码安全仍然具有挑战性。现有的漏洞检测流程要么依赖于静态分析器，要么使用基于LLM/GNN的检测器，这些检测器仅接受粗粒度的程序级监督进行训练。这两类方法通常需要完整的代码上下文，仅提供稀疏的完成时反馈，并且可能随着代码长度的增加而性能下降，因此不适合在交互式编码和流式生成过程中进行实时、前缀级别的评估。我们提出了SecCodePRM，一种面向安全的过程奖励模型，它能够沿着代码轨迹分配具有上下文感知的步骤级安全分数。为了训练该模型，我们从静态分析器和专家标注中推导出步骤级监督标签，使模型能够更精确地关注与过程间漏洞相关的细粒度区域。SecCodePRM具有三种应用场景：完整代码漏洞检测、部分代码漏洞检测以及安全代码生成。对于漏洞检测，SecCodePRM采用强调高风险步骤的风险敏感聚合策略；对于代码生成，SecCodePRM通过排序候选续写并偏好更高累积奖励的方式支持推理时缩放。这种设计产生了密集的实时反馈，并能扩展到长序列生成。实验表明，SecCodePRM在所有三种场景下均优于先前的方法，同时保持了代码功能正确性，这表明其在没有安全-效用权衡的前提下实现了安全性的提升。