Malware code often resorts to various self-protection techniques to complicate analysis. One such technique is applying Mixed-Boolean Arithmetic (MBA) expressions as a way to create opaque predicates and diversify and obfuscate the data flow. In this work we aim to provide tools for the simplification of nonlinear MBA expressions in a very practical context to compete in the arms race between the generation of hard, diverse MBAs and their analysis. The proposed algorithm GAMBA employs algebraic rewriting at its core and extends SiMBA. It achieves efficient deobfuscation of MBA expressions from the most widely tested public datasets and simplifies expressions to their ground truths in most cases, surpassing peer tools.
翻译:恶意代码常采用多种自我保护技术以增加分析难度,其中一种技术是使用混合布尔算术(MBA)表达式来制造不透明谓词,并实现数据流的多样化与混淆。本研究旨在提供实用化工具,用于简化非线性MBA表达式,以应对复杂多样MBA生成与解析之间的技术对抗。所提出的GAMBA算法以代数重写为核心,基于SiMBA进行扩展。该算法能够高效解混淆来自最广泛测试的公开数据集中的MBA表达式,在多数情况下将表达式简化为真实结果,性能超越同类工具。