Software vulnerabilities pose critical security and risk concerns for many software systems. Many techniques have been proposed to effectively assess and prioritize these vulnerabilities before they cause serious consequences. To evaluate their performance, these solutions often craft their own experimental datasets from limited information sources, such as MITRE CVE and NVD, lacking a global overview of broad vulnerability intelligence. The repetitive data preparation process further complicates the verification and comparison of new solutions. To resolve this issue, in this paper, we propose VulZoo, a comprehensive vulnerability intelligence dataset that covers 17 popular vulnerability information sources. We also construct connections among these sources, enabling more straightforward configuration and adaptation for different vulnerability assessment tasks (e.g., vulnerability type prediction). Additionally, VulZoo provides utility scripts for automatic data synchronization and cleaning, relationship mining, and statistics generation. We make VulZoo publicly available and maintain it with incremental updates to facilitate future research. We believe that VulZoo serves as a valuable input to vulnerability assessment and prioritization studies. The dataset with utility scripts is available at https://github.com/NUS-Curiosity/VulZoo.
翻译:软件漏洞对许多软件系统构成了关键的安全与风险问题。在漏洞引发严重后果之前,已有许多技术被提出来对其进行有效评估与优先级排序。为评估这些解决方案的性能,它们通常基于有限的信息源(如MITRE CVE和NVD)构建自己的实验数据集,缺乏对广泛漏洞情报的全局概览。重复的数据准备过程进一步复杂化了新解决方案的验证与比较。为解决此问题,本文提出了VulZoo,一个全面的漏洞情报数据集,涵盖了17个流行的漏洞信息源。我们还构建了这些信息源之间的关联,从而为不同的漏洞评估任务(例如,漏洞类型预测)提供更直接的配置与适配。此外,VulZoo提供了实用脚本,用于自动数据同步与清洗、关系挖掘以及统计信息生成。我们将VulZoo公开提供,并通过增量更新进行维护,以促进未来研究。我们相信,VulZoo能为漏洞评估与优先级排序研究提供有价值的输入。数据集及实用脚本可在 https://github.com/NUS-Curiosity/VulZoo 获取。