Control Flow Graph Recovery for Dynamically Loaded Code via Symbolic Library Resolution

Control Flow Graphs are one of the main data sources for software analysis that use dynamic and static software analysis methods. Protected software and modern malware increasingly depend on dynamic code loading techniques to evade static analysis. Usage of runtime dynamic linking mechanisms introduces unresolved indirect calls that stop static Control Flow Graph recovery. This serves to hide dynamic library that can be used for prevention of security analysis. To address this limitation, an analysis technique is proposed that combines symbolic execution with speculative library preloading to recover Control Flow Graphs from binaries by using dynamic loading. The methodology uses custom software hooks that intercept dynamic loading operations during symbolic execution and perform actual library loading into the analysis state. The module is based on a two-level architecture that stores interception functions and instruction tracking at the same time, all within a symbolic execution environment. To avoid executing potentially malicious code that dynamic instrumentation tools require, the analysis was conducted entirely through symbolic execution, making it safe for malware analysis. For evaluation a batch of 16 synthetic benchmarks was used, employing various obfuscation techniques including encrypted library names, network-triggered loading, environment-derived paths, multi-stage decryption chains, fileless execution and manual executable and linkable format parsing. The experiments results show that module recovers on average 29.8 % additional Control Flow Graph nodes and 26.5 % additional edges compared to static analysis alone, achieves 100 % precision and 100 % recall in library detection, with all discoveries validated through Frida-based dynamic instrumentation.

翻译：控制流图是采用动态和静态软件分析方法进行软件分析的主要数据源之一。受保护软件和现代恶意软件日益依赖动态代码加载技术来规避静态分析。运行时动态链接机制的使用会引入未解析的间接调用，阻碍静态控制流图的恢复，从而隐藏可用于阻碍安全分析的动态库。为解决这一局限性，本文提出了一种结合符号执行与推测性库预加载的分析技术，通过利用动态加载从二进制文件中恢复控制流图。该方法使用自定义软件钩子，在符号执行期间拦截动态加载操作，并将实际库加载到分析状态中。该模块基于双层架构，在符号执行环境中同时存储拦截函数和指令跟踪信息。为避免执行动态插桩工具所需的潜在恶意代码，分析完全通过符号执行进行，保障了恶意软件分析的安全性。为进行评估，实验采用了16个合成基准测试程序，并运用了多种混淆技术，包括加密库名、网络触发加载、环境派生路径、多级解密链、无文件执行以及手动可执行与可链接格式解析。实验结果表明，与纯静态分析相比，该模块平均多恢复29.8%的控制流图节点和26.5%的边，在库检测中实现了100%的精确率和100%的召回率，所有发现均通过基于Frida的动态插桩进行了验证。