Vulnerability detection in C programs is a critical challenge in software security. Although large language models (LLMs) achieve strong detection performance, their multi-billion-parameter scale makes them impractical for integration into development workflows requiring low latency and continuous analysis. We introduce VULNSCOUT-C, a compact transformer architecture with 693M total parameters (353M active during inference), derived from the Qwen model family and optimized for C code vulnerability detection. Alongside the model, we present VULNSCOUT, a new 33,565-sample curated dataset generated through a controlled multi-agent pipeline with formal verification, designed to fill coverage gaps in existing benchmarks across underrepresented CWE categories. Evaluated on a standardized C vulnerability detection benchmark, VULNSCOUT-C outperforms all evaluated baselines, including state-of-the-art reasoning LLMs and commercial static analysis tools, while offering a fraction of their inference cost. These results demonstrate that task-specialized compact architectures can match or even outperform the detection capability of models orders of magnitude larger, making continuous, low-latency vulnerability analysis practical within real-world development workflows.
翻译:C程序中的漏洞检测是软件安全领域的关键挑战。尽管大语言模型(LLMs)在检测性能上表现优异,但其数十亿规模的参数量使其难以集成到需要低延迟与持续分析的开发工作流中。我们提出VULNSCOUT-C——一种源自Qwen模型家族、专为C代码漏洞检测优化的紧凑型Transformer架构,总参数量为693M(推理阶段激活参数353M)。与该模型同步,我们发布了VULNSCOUT新数据集,该数据集通过受控多智能体流水线结合形式验证生成,包含33,565个精选样本,旨在填补现有基准测试在代表性不足的CWE类别上的覆盖空白。在标准化C漏洞检测基准上的评估显示,VULNSCOUT-C不仅超越所有基线模型(包括最先进的推理型LLMs与商业静态分析工具),且推理成本仅为它们的零头。这些结果表明,针对特定任务优化的紧凑架构能匹配甚至超越参数量大数个量级模型的检测能力,使持续低延迟的漏洞分析在实际开发工作流中成为可行方案。