Malware often uses obfuscation to hinder security analysis. Among these techniques, virtualization-based obfuscation is particularly strong because it protects programs by translating original instructions into attacker-defined virtual machine (VM) bytecode, producing long and complex code that is difficult to analyze and deobfuscate. This paper aims to identify the structural components of virtualization-based obfuscation through static analysis. By examining the execution model of obfuscated code, we define and detect the key elements required for deobfuscation-namely the dispatch routine, handler blocks, and the VM region-using LLVM IR. Experimental results show that, in the absence of compiler optimizations, the proposed LLVM Pass successfully detects all core structures across major virtualization options, including switch, direct, and indirect modes.
翻译:恶意软件常采用混淆技术阻碍安全分析。其中,虚拟化混淆因其通过将原始指令转换为攻击者自定义的虚拟机字节码来保护程序,能生成冗长复杂的代码而难以分析与反混淆,成为效果尤为显著的技术。本文旨在通过静态分析识别虚拟化混淆的结构组件。通过研究混淆代码的执行模型,我们在LLVM IR层面定义并检测反混淆所需的关键要素——即调度例程、处理程序块及虚拟机区域。实验结果表明,在未启用编译器优化的情况下,所提出的LLVM Pass能够成功检测包括switch模式、直接模式与间接模式在内的主流虚拟化选项中的所有核心结构。