The rapid expansion of the Python ecosystem has fueled two distinct but converging threats: adversaries increasingly target the software supply chain via the Python Package Index (PyPI), while also building evasive, cross-platform malicious binaries compiled from source code written in Python. Current program analysis techniques struggle to address this dual threat. Static analysis based tools are often blinded by runtime obfuscation and compiled bytecode, while dynamic analysis based ones are fragile, prone to evasion by environmental guardrails, and often terminates prematurely due to unsatisfied dependencies. To overcome these limitations, we present PyFEX, a resilient forced-execution engine. PyFEX explores a program's behavioral space systematically by forcing execution across all conditional branches to bypass evasion checks. To address the fragility of dynamic execution, it introduces a novel resilient crash recovery mechanism that synthesizes dummy objects to satisfy failed operations at the runtime, allowing analysis to proceed past fatal errors, and employs path merging to mitigate path explosion. PyFEX further incorporates an automated entry identification mechanism that proactively discovers and invokes dormant functions, exposing malicious logic hidden within uncalled APIs. To demonstrate the efficacy of this engine, we built PyFEXScan, a proof-of-concept malware detector built on top of PyFEX. Evaluated against both known malicious PyPI packages and real-world compiled binaries, PyFEX exposes critical behaviors missed by the existing state-of-the-art tools. In a live deployment on PyPI, PyFEXScan discovered 212 previously unknown malicious packages accounting for over 91,648 downloads, underscoring the necessity of resilient, exhaustive analysis for securing the Python ecosystem.
翻译:Python生态系统的快速扩张催生了两类趋同的威胁:攻击者一方面通过Python包索引(PyPI)瞄准软件供应链,另一方面利用Python源代码构建具有规避能力且跨平台的恶意二进制文件。现有程序分析技术难以应对这种双重威胁。基于静态分析的工具常因运行时混淆与编译后的字节码而失效,而基于动态分析的工具则脆弱易受环境防护机制规避影响,且常因依赖项未满足而过早终止。为克服这些局限,我们提出PyFEX——一种鲁棒的强制执行引擎。PyFEX通过强制遍历所有条件分支以规避检查,系统性地探索程序的行为空间。针对动态执行的脆弱性,它引入一种新型鲁棒崩溃恢复机制:通过合成伪对象来满足运行时的失败操作,使分析得以跨越致命错误继续执行,并采用路径合并缓解路径爆炸问题。PyFEX进一步集成自动化入口识别机制,主动发现并调用休眠函数,揭露隐藏于未调用API中的恶意逻辑。为验证该引擎的有效性,我们在PyFEX之上构建了概念验证恶意软件检测器PyFEXScan。在已知恶意PyPI包和真实编译二进制文件上的评估表明,PyFEX揭示了现有最先进工具遗漏的关键行为。在PyPI的实时部署中,PyFEXScan发现了212个此前未知的恶意包,累计下载量超过91,648次,凸显了对Python生态系统实施鲁棒且穷举分析的迫切需求。