Tracking vulnerabilities inherited from third-party open-source software is a well-known challenge, often addressed by tracing the threads of dependency information. However, vulnerabilities can also propagate through forking: a code repository forked after the introduction of a vulnerability, but before it is patched, may remain vulnerable long after the vulnerability has been fixed in the initial repository. History analysis approaches are used to track vulnerable software versions at scale. However, such approaches fail to track vulnerabilities in forks, leaving fork maintainers to identify them manually. This paper presents a global history analysis approach to help software developers identify one-day (known but unpatched) vulnerabilities in forked repositories. Leveraging the global graph of public code, as captured by the Software Heritage archive, our approach propagates vulnerability information at the commit level and performs automated impact analysis. Starting from 7162 repositories with vulnerable commits listed in OSV, we propagate vulnerability information to 2.2 million forks. We evaluate our approach by filtering forks with significant user bases whose latest commit is still potentially vulnerable, manually auditing the code, and contacting maintainers for confirmation and responsible disclosure. This process identified 135 high-severity one-day vulnerabilities, achieving a precision of 0.69, with 9 confirmed by maintainers.
翻译:追踪源自第三方开源软件的漏洞是一个众所周知的挑战,通常通过追踪依赖信息线索来解决。然而,漏洞也可以通过分叉传播:在漏洞引入后、但尚未修补前分叉的代码仓库,可能在原始仓库中漏洞修复后仍长期保持易受攻击状态。历史分析方法被用于大规模追踪易受攻击的软件版本。然而,此类方法无法追踪分叉中的漏洞,导致分叉维护者需要手动识别它们。本文提出了一种全局历史分析方法,以帮助软件开发人员识别分叉仓库中的一日(已知但未修补)漏洞。利用软件遗产档案库捕获的公共代码全局图,我们的方法在提交级别传播漏洞信息并执行自动化影响分析。从OSV中列出的包含易受攻击提交的7162个仓库出发,我们将漏洞信息传播至220万个分叉。我们通过筛选具有重要用户基础且最新提交仍可能易受攻击的分叉、手动审计代码并联系维护者进行确认和负责任披露来评估我们的方法。该过程识别出135个高严重性的一日漏洞,精确度达到0.69,其中9个已得到维护者确认。