Open-source software (OSS) dependencies are a dominant component of modern software code bases. Using proven and well-tested OSS components lets developers reduce development time and cost while improving quality. However, heavy reliance on open-source software also introduces significant security risks, including the incorporation of known vulnerabilities into the codebase. To mitigate these risks, metadata-based dependency scanners, which are lightweight and fast, and code-centric scanners, which enable the detection of modified dependencies hidden from metadata-based approaches, have been developed. In this paper, we present Unshade, a hybrid approach towards dependency scanning in Java that combines the efficiency of metadata-based scanning with the ability to detect modified dependencies of code-centric approaches. Unshade first augments a Java project's software bill of materials (SBOM) by identifying modified and hidden dependencies via a bytecode-based fingerprinting mechanism. This augmented SBOM is then passed to a metadata-based vulnerability scanner to identify known vulnerabilities in both declared and newly revealed dependencies. Leveraging Unshade's high scalability, we conducted a large-scale study of the 1,808 most popular open-source Java Maven projects on GitHub. The results show that nearly 50% of these projects contain at least one modified, hidden dependency associated with a known vulnerability. On average, each affected project includes more than eight such hidden vulnerable dependencies, all missed by traditional metadata-based scanners. Overall, Unshade identified 7,712 unique CVEs in hidden dependencies that would remain undetected when relying on metadata-based scanning alone.
翻译:开源软件(OSS)依赖项是现代软件代码库的主要组成部分。使用经过验证且充分测试的OSS组件可帮助开发者缩短开发时间、降低成本并提升质量。然而,对开源软件的深度依赖也带来了重大的安全风险,包括将已知漏洞引入代码库。为缓解这些风险,业界已开发出基于元数据的轻量级快速依赖扫描器,以及能够检测基于元数据方法无法发现的隐蔽依赖项的代码中心化扫描器。本文提出Unshade——一种面向Java依赖扫描的混合方法,该方法结合了基于元数据扫描的高效性与代码中心化方法检测修改后依赖项的能力。Unshade首先通过基于字节码的指纹识别机制识别被修改和隐藏的依赖项,从而扩展Java项目的软件物料清单(SBOM)。随后将扩展后的SBOM传递给基于元数据的漏洞扫描器,以识别声明依赖项和新发现依赖项中的已知漏洞。借助Unshade的高可扩展性,我们对GitHub上1,808个最受欢迎的开源Java Maven项目进行了大规模研究。结果表明,近50%的项目至少包含一个与已知漏洞相关的被修改、隐藏的依赖项。平均每个受影响项目包含超过八个此类隐藏的易受攻击依赖项,这些依赖项均被传统基于元数据的扫描器遗漏。总体而言,Unshade在隐藏依赖项中识别出7,712个独特CVE漏洞,这些漏洞若仅依赖基于元数据的扫描将无法被发现。