Large language models (LLMs) are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality, domain-specific models and training datasets. To address this gap, we present CyberPal 2.0, a family of cybersecurity-expert small language models (SLMs) ranging from 4B-20B parameters. To train CyberPal 2.0, we generate an enriched chain-of-thought cybersecurity instruction dataset built with our data enrichment and formatting pipeline, SecKnowledge 2.0, which integrates expert-in-the-loop steering of reasoning formats alongside LLM-driven multi-step grounding, yielding higher-fidelity, task-grounded reasoning traces for security tasks. Across diverse cybersecurity benchmarks, CyberPal 2.0 consistently outperforms its baselines and matches or surpasses various open and closed-source frontier models, while remaining a fraction of their size. On core cyber threat intelligence knowledge tasks, our models outperform almost all tested frontier models, ranking second only to Sec-Gemini v1. On core threat-investigation tasks, such as correlating vulnerabilities and bug tickets with weaknesses, our best 20B-parameter model outperforms GPT-4o, o1, o3-mini, and Sec-Gemini v1, ranking first, while our smallest 4B-parameter model ranks second.
翻译:大型语言模型(LLM)正在改变日常应用,但由于缺乏高质量的领域专用模型和训练数据集,其在网络安全领域的部署相对滞后。为填补这一空白,我们提出了CyberPal 2.0——一个参数规模从40亿到200亿不等的网络安全专家型小语言模型(SLM)系列。为训练CyberPal 2.0,我们通过数据增强与格式化流程SecKnowledge 2.0构建了增强型思维链网络安全指令数据集。该流程融合了专家在环的推理格式引导与LLM驱动的多步骤 grounding 机制,从而为安全任务生成更高保真度、任务 grounded 的推理轨迹。在多样化的网络安全基准测试中,CyberPal 2.0始终优于基线模型,并达到或超越各类开源与闭源前沿模型的性能,同时其模型尺寸仅为后者的很小一部分。在核心网络威胁情报知识任务上,我们的模型超越了几乎所有测试的前沿模型,性能仅次于Sec-Gemini v1。在核心威胁调查任务(如关联漏洞、缺陷工单与弱点)中,我们最佳的200亿参数模型超越了GPT-4o、o1、o3-mini和Sec-Gemini v1,位列第一;而最小的40亿参数模型则位列第二。