CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports

Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow and expensive, which motivates the need for automated approaches. Researchers and practitioners have proposed to automatically identify libraries from vulnerability reports using extreme multi-label learning (XML). While state-of-the-art XML techniques showed promising performance, their experiment settings do not practically fit what happens in reality. Previous studies randomly split the vulnerability reports data for training and testing their models without considering the chronological order of the reports. This may unduly train the models on chronologically newer reports while testing the models on chronologically older ones. However, in practice, one often receives chronologically new reports, which may be related to previously unseen libraries. Under this practical setting, we observe that the performance of current XML techniques declines substantially, e.g., F1 decreased from 0.7 to 0.24 under experiments without and with consideration of chronological order of vulnerability reports. We propose a practical library identification approach, namely CHRONOS, based on zero-shot learning. The novelty of CHRONOS is three-fold. First, CHRONOS fits into the practical pipeline by considering the chronological order of vulnerability reports. Second, CHRONOS enriches the data of the vulnerability descriptions and labels using a carefully designed data enhancement step. Third, CHRONOS exploits the temporal ordering of the vulnerability reports using a cache to prioritize prediction of...

翻译：用于向开发者预警库漏洞的工具依赖于安全研究人员维护的准确且最新的漏洞数据库。这些数据库记录了每个漏洞相关的库。然而，漏洞报告可能不会明确列出每个库，需要人工分析来确定所有相关库。人工分析可能既慢又昂贵，这促使了自动化方法的需求。研究人员和从业者已提出使用极端多标签学习（XML）从漏洞报告中自动识别库。尽管最先进的XML技术表现出色，但其实验设置并不符合实际情况。以往研究随机划分漏洞报告数据来训练和测试模型，未考虑报告的时间顺序。这可能导致模型不恰当地以时间较新的报告进行训练，而用时间较旧的报告进行测试。然而，在实践中，人们经常收到时间较新的报告，这些报告可能涉及之前未见过的库。在这种实际设置下，我们观察到当前XML技术的性能大幅下降，例如，在不考虑与考虑漏洞报告时间顺序的实验中，F1分数从0.7降至0.24。我们提出了一种基于零样本学习的实用库识别方法，即CHRONOS。CHRONOS的创新性体现在三个方面。首先，通过考虑漏洞报告的时间顺序，CHRONOS符合实际工作流程。其次，CHRONOS利用精心设计的数据增强步骤，丰富了漏洞描述和标签的数据。第三，CHRONOS利用缓存技术优先预测漏洞报告的时间顺序，从而利用其时间排序信息。

相关内容

XPath

关注 1

XPath即为XML路径语言，它是一种用来确定XML（标准通用标记语言的子集）文档中某部分位置的语言。XPath基于XML的树状结构，提供在数据结构树中找寻节点的能力。起初 XPath 的提出的初衷是将其作为一个通用的、介于XPointer与XSLT间的语法模型。但是 XPath 很快的被开发者采用来当作小型查询语言。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日