Uncovering Hidden Inclusions of Vulnerable Dependencies in Real-World Java Projects

Open-source software (OSS) dependencies are a dominant component of modern software code bases. Using proven and well-tested OSS components lets developers reduce development time and cost while improving quality. However, heavy reliance on open-source software also introduces significant security risks, including the incorporation of known vulnerabilities into the codebase. To mitigate these risks, metadata-based dependency scanners, which are lightweight and fast, and code-centric scanners, which enable the detection of modified dependencies hidden from metadata-based approaches, have been developed. In this paper, we present Unshade, a hybrid approach towards dependency scanning in Java that combines the efficiency of metadata-based scanning with the ability to detect modified dependencies of code-centric approaches. Unshade first augments a Java project's software bill of materials (SBOM) by identifying modified and hidden dependencies via a bytecode-based fingerprinting mechanism. This augmented SBOM is then passed to a metadata-based vulnerability scanner to identify known vulnerabilities in both declared and newly revealed dependencies. Leveraging Unshade's high scalability, we conducted a large-scale study of the 1,808 most popular open-source Java Maven projects on GitHub. The results show that nearly 50% of these projects contain at least one modified, hidden dependency associated with a known vulnerability. On average, each affected project includes more than eight such hidden vulnerable dependencies, all missed by traditional metadata-based scanners. Overall, Unshade identified 7,712 unique CVEs in hidden dependencies that would remain undetected when relying on metadata-based scanning alone.

翻译：开源软件（OSS）依赖项是现代软件代码库的主要组成部分。使用经过验证且充分测试的OSS组件可帮助开发者缩短开发时间、降低成本并提升质量。然而，对开源软件的深度依赖也带来了重大的安全风险，包括将已知漏洞引入代码库。为缓解这些风险，业界已开发出基于元数据的轻量级快速依赖扫描器，以及能够检测基于元数据方法无法发现的隐蔽依赖项的代码中心化扫描器。本文提出Unshade——一种面向Java依赖扫描的混合方法，该方法结合了基于元数据扫描的高效性与代码中心化方法检测修改后依赖项的能力。Unshade首先通过基于字节码的指纹识别机制识别被修改和隐藏的依赖项，从而扩展Java项目的软件物料清单（SBOM）。随后将扩展后的SBOM传递给基于元数据的漏洞扫描器，以识别声明依赖项和新发现依赖项中的已知漏洞。借助Unshade的高可扩展性，我们对GitHub上1,808个最受欢迎的开源Java Maven项目进行了大规模研究。结果表明，近50%的项目至少包含一个与已知漏洞相关的被修改、隐藏的依赖项。平均每个受影响项目包含超过八个此类隐藏的易受攻击依赖项，这些依赖项均被传统基于元数据的扫描器遗漏。总体而言，Unshade在隐藏依赖项中识别出7,712个独特CVE漏洞，这些漏洞若仅依赖基于元数据的扫描将无法被发现。

相关内容

元数据

关注 7

元数据（Metadata），又称元数据、中介数据、中继数据[来源请求]，为描述数据的数据（data about data），主要是描述数据属性（property）的信息，用来支持如指示存储位置、历史数据、资源查找、文件纪录等功能。元数据算是一种电子式目录，为了达到编制目录的目的，必须在描述并收藏数据的内容或特色，进而达成协助数据检索的目的。

对抗性实验：利用敏感性分析、邻域搜索启发式算法和概率性想定生成来暴露人工智能弱点 | 2025最新83页

专知会员服务

30+阅读 · 2025年10月21日

深度学习中的架构后门：漏洞、检测与防御综述

专知会员服务

12+阅读 · 2025年7月19日

《AI/ML 供应链软件依赖性风险分析》2023最新95页论文

专知会员服务

41+阅读 · 2023年12月19日

《使用静态污点分析检测恶意代码》CMU最新30页slides

专知会员服务

22+阅读 · 2023年10月11日