Modern software heavily relies on the use of components. Those components are usually published in central repositories, and managed by build systems via dependencies. Due to issues around vulnerabilities, licenses and the propagation of bugs, the study of those dependencies is of utmost importance, and numerous software composition analysis tools have emerged for this purpose. A particular challenge are hidden dependencies that are the result of cloning or shading where code from a component is "inlined", and, in the case of shading, moved to different namespaces. We present a novel approach to detect vulnerable clones in the Maven repository. Our approach is lightweight in that it does not require the creation and maintenance of a custom index. Starting with 29 vulnerabilities with assigned CVEs and proof-of-vulnerability projects, we retrieve over 53k potential vulnerable clones from Maven Central. After running our analysis on this set, we detect 727 confirmed vulnerable clones (86 if versions are aggregated) and synthesize a testable proof-of-vulnerability project for each of those. We demonstrate that existing SCA tools often miss those exposures. At the time of submission those results have led to changes to the entries for six CVEs in the GitHub Security Advisory Database (GHSA) via accepted pull requests, with more pending.
翻译:现代软件高度依赖组件的使用。这些组件通常发布在中央仓库中,并通过构建系统以依赖关系进行管理。由于漏洞、许可证和错误传播等问题,研究这些依赖关系至关重要,为此出现了大量软件组成分析工具。一个特殊的挑战是隐藏依赖关系,这些关系源于克隆或遮蔽,即组件的代码被"内联",并且在遮蔽情况下被移至不同的命名空间。我们提出了一种新颖的方法来检测Maven仓库中的易受攻击克隆。该方法轻量化,无需创建和维护自定义索引。从29个具有分配CVE编号和漏洞验证项目的漏洞入手,我们从Maven中央仓库检索到超过53000个潜在的易受攻击克隆。对该数据集进行分析后,我们检测到727个确认的易受攻击克隆(按版本聚合后为86个),并为每个克隆合成了一个可测试的漏洞验证项目。我们证明现有的SCA工具常常遗漏这些暴露点。在提交时,这些结果已通过已接受的拉取请求导致GitHub安全公告数据库(GHSA)中六个CVE条目发生变更,且有更多变更待处理。