Cross-Ecosystem Vulnerability Analysis for Python Applications

Python applications depend on third-party native libraries that may be vendored within package distributions or installed on the host system. When vulnerabilities are discovered in these native libraries, determining which Python packages are affected requires analysis across ecosystem boundaries, from Python dependency graphs to OS distribution packages. Current vulnerability scanners produce false negatives by overlooking vulnerabilities in vendored native libaries and false positives by failing to account for security patches backported by OS distributions. We present a provenance-aware vulnerability analysis approach that resolves vendored libraries to specific OS package versions or upstream project releases. Our approach queries vendored libraries against a database of historical OS package artifacts using content-based hashing, and applies library-specific dynamic analyses to extract version information from binaries built from upstream source. We then construct cross-ecosystem call graphs by stitching together Python and binary call graphs across dependency boundaries, enabling reachability analysis of vulnerable functions. Evaluating on 100,000 Python packages and 10 known CVEs associated with third-party native dependencies, We identify 39 directly vulnerable packages (47M+ monthly downloads) and 312 indirectly vulnerable client packages affected through dependency chains. Our analysis reduces false positives by 52% on average compared to upstream version matching, and by up to 97% for heavily-patched libraries. We responsibly disclosed all findings to maintainers; 54 issues have been fixed to date.

翻译：Python应用依赖于第三方原生库，这些库可能打包在发布包中或安装在宿主机系统上。当这些原生库中发现漏洞时，确定受影响的Python包需要跨生态系统边界分析，从Python依赖图到操作系统分发包。现有漏洞扫描器会因忽略打包原生库中的漏洞而产生漏报，或因未能考虑操作系统分发版反向移植的安全补丁而产生误报。我们提出一种溯源感知的漏洞分析方法，可将打包库解析为特定操作系统包版本或上游项目发布版本。该方法利用基于内容的哈希技术，将打包库与历史操作系统包工件数据库进行比对，并应用库特定的动态分析从上游源码构建的二进制文件中提取版本信息。随后通过跨越依赖边界拼接Python和二进制调用图，构建跨生态系统调用图，从而实现对易受攻击函数的可达性分析。在10万个Python包和与第三方原生依赖相关的10个已知CVE评估中，我们识别出39个直接易受攻击的包（月下载量超4700万）和312个通过依赖链受影响的间接易受攻击客户端包。与上游版本匹配相比，我们的分析平均将误报率降低52%，对于重度补丁库误报率降低达97%。我们已向维护者负责任地披露所有发现，截至目前已有54个问题得到修复。