VulZoo: A Comprehensive Vulnerability Intelligence Dataset

Software vulnerabilities pose critical security and risk concerns for many software systems. Many techniques have been proposed to effectively assess and prioritize these vulnerabilities before they cause serious consequences. To evaluate their performance, these solutions often craft their own experimental datasets from limited information sources, such as MITRE CVE and NVD, lacking a global overview of broad vulnerability intelligence. The repetitive data preparation process further complicates the verification and comparison of new solutions. To resolve this issue, in this paper, we propose VulZoo, a comprehensive vulnerability intelligence dataset that covers 17 popular vulnerability information sources. We also construct connections among these sources, enabling more straightforward configuration and adaptation for different vulnerability assessment tasks (e.g., vulnerability type prediction). Additionally, VulZoo provides utility scripts for automatic data synchronization and cleaning, relationship mining, and statistics generation. We make VulZoo publicly available and maintain it with incremental updates to facilitate future research. We believe that VulZoo serves as a valuable input to vulnerability assessment and prioritization studies. The dataset with utility scripts is available at https://github.com/NUS-Curiosity/VulZoo.

翻译：软件漏洞对许多软件系统构成了关键的安全与风险问题。在漏洞引发严重后果之前，已有许多技术被提出来对其进行有效评估与优先级排序。为评估这些解决方案的性能，它们通常基于有限的信息源（如MITRE CVE和NVD）构建自己的实验数据集，缺乏对广泛漏洞情报的全局概览。重复的数据准备过程进一步复杂化了新解决方案的验证与比较。为解决此问题，本文提出了VulZoo，一个全面的漏洞情报数据集，涵盖了17个流行的漏洞信息源。我们还构建了这些信息源之间的关联，从而为不同的漏洞评估任务（例如，漏洞类型预测）提供更直接的配置与适配。此外，VulZoo提供了实用脚本，用于自动数据同步与清洗、关系挖掘以及统计信息生成。我们将VulZoo公开提供，并通过增量更新进行维护，以促进未来研究。我们相信，VulZoo能为漏洞评估与优先级排序研究提供有价值的输入。数据集及实用脚本可在 https://github.com/NUS-Curiosity/VulZoo 获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日