Security advisories are the primary channel of communication for discovered vulnerabilities in open-source software, but they often lack crucial information. Specifically, 63% of vulnerability database reports are missing their patch links, also referred to as vulnerability fixing commits (VFCs). This paper introduces VFCFinder, a tool that generates the top-five ranked set of VFCs for a given security advisory using Natural Language Programming Language (NL-PL) models. VFCFinder yields a 96.6% recall for finding the correct VFC within the Top-5 commits, and an 80.0% recall for the Top-1 ranked commit. VFCFinder generalizes to nine different programming languages and outperforms state-of-the-art approaches by 36 percentage points in terms of Top-1 recall. As a practical contribution, we used VFCFinder to backfill over 300 missing VFCs in the GitHub Security Advisory (GHSA) database. All of the VFCs were accepted and merged into the GHSA database. In addition to demonstrating a practical pairing of security advisories to VFCs, our general open-source implementation will allow vulnerability database maintainers to drastically improve data quality, supporting efforts to secure the software supply chain.
翻译:安全公告是开源软件中已发现漏洞的主要沟通渠道,但通常缺乏关键信息。具体而言,63%的漏洞数据库报告缺少补丁链接(即漏洞修复提交,VFC)。本文介绍VFCFinder,一种利用自然语言编程语言模型生成给定安全公告的前五位VFC候选集的工具。VFCFinder在前五位提交中找到正确VFC的召回率达96.6%,在排名第一的提交中召回率达80.0%。该工具可泛化至九种不同编程语言,并在Top-1召回率上比现有最优方法高出36个百分点。作为实际贡献,我们使用VFCFinder在GitHub安全公告数据库中回填了超过300条缺失的VFC,所有VFC均被接受并合并至该数据库。除展示了安全公告与VFC的实际配对能力外,我们的通用开源实现将允许漏洞数据库维护者显著提升数据质量,助力保障软件供应链安全。