Toward Cybersecurity-Expert Small Language Models

Large language models (LLMs) are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality, domain-specific models and training datasets. To address this gap, we present CyberPal 2.0, a family of cybersecurity-expert small language models (SLMs) ranging from 4B-20B parameters. To train CyberPal 2.0, we generate an enriched chain-of-thought cybersecurity instruction dataset built with our data enrichment and formatting pipeline, SecKnowledge 2.0, which integrates expert-in-the-loop steering of reasoning formats alongside LLM-driven multi-step grounding, yielding higher-fidelity, task-grounded reasoning traces for security tasks. Across diverse cybersecurity benchmarks, CyberPal 2.0 consistently outperforms its baselines and matches or surpasses various open and closed-source frontier models, while remaining a fraction of their size. On core cyber threat intelligence knowledge tasks, our models outperform almost all tested frontier models, ranking second only to Sec-Gemini v1. On core threat-investigation tasks, such as correlating vulnerabilities and bug tickets with weaknesses, our best 20B-parameter model outperforms GPT-4o, o1, o3-mini, and Sec-Gemini v1, ranking first, while our smallest 4B-parameter model ranks second.

翻译：大型语言模型（LLM）正在改变日常应用，但由于缺乏高质量的领域专用模型和训练数据集，其在网络安全领域的部署相对滞后。为填补这一空白，我们提出了CyberPal 2.0——一个参数规模从40亿到200亿不等的网络安全专家型小语言模型（SLM）系列。为训练CyberPal 2.0，我们通过数据增强与格式化流程SecKnowledge 2.0构建了增强型思维链网络安全指令数据集。该流程融合了专家在环的推理格式引导与LLM驱动的多步骤 grounding 机制，从而为安全任务生成更高保真度、任务 grounded 的推理轨迹。在多样化的网络安全基准测试中，CyberPal 2.0始终优于基线模型，并达到或超越各类开源与闭源前沿模型的性能，同时其模型尺寸仅为后者的很小一部分。在核心网络威胁情报知识任务上，我们的模型超越了几乎所有测试的前沿模型，性能仅次于Sec-Gemini v1。在核心威胁调查任务（如关联漏洞、缺陷工单与弱点）中，我们最佳的200亿参数模型超越了GPT-4o、o1、o3-mini和Sec-Gemini v1，位列第一；而最小的40亿参数模型则位列第二。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日