When attempting to understand the behavior of an executable, a binary analyst can make use of many different techniques. These include program slicing, dynamic instrumentation, binary-level rewriting, symbolic execution, and formal verification, all of which can uncover insights into how a piece of machine code behaves. As a result, there is no one-size-fits-all binary analysis tool, so a binary analysis researcher will often combine several different tools. Sometimes, a researcher will even need to design new tools to study problems that existing frameworks are not well equipped to handle. Designing such tools from complete scratch is rarely time- or cost-effective, however, given the scale and complexity of modern instruction set architectures. We present Macaw, a modular framework that makes it possible to rapidly build reliable binary analysis tools across a range of use cases. Over a decade of development, we have used Macaw to support an industrial research team in building tools for machine code-related tasks. As such, the name "Macaw" refers not just to the framework itself, but also a suite of tools that are built on top of the framework. We describe Macaw in depth and describe the different static and dynamic analyses that it performs, many of which are powered by an SMT-based symbolic execution engine. We put a particular focus on interoperability between machine code and higher-level languages, including binary lifting from x86 to LLVM, as well verifying the correctness of mixed C and assembly code.
翻译:在尝试理解可执行文件行为时,二进制分析师可采用多种技术手段。这些技术包括程序切片、动态插桩、二进制重写、符号执行和形式化验证,所有这些方法都能揭示机器代码的行为特征。因此,并不存在适用于所有场景的二进制分析工具,二进制分析研究人员往往需要组合多种工具。有时,研究人员甚至需要设计新工具来处理现有框架无法妥善解决的问题。然而,考虑到现代指令集架构的规模与复杂性,完全从零开始设计此类工具在时间和成本上往往效率低下。本文提出Macaw——一个模块化框架,能够支持快速构建适用于多种应用场景的可靠二进制分析工具。经过十余年的开发,我们已运用Macaw为工业研究团队构建了多种机器代码相关任务工具。因此,“Macaw”不仅指代框架本身,还包含基于该框架构建的一系列工具集。我们将深入阐述Macaw框架,详细介绍其支持的静态与动态分析功能,其中多项功能由基于SMT的符号执行引擎驱动。我们特别关注机器代码与高级语言之间的互操作性,包括从x86到LLVM的二进制提升技术,以及混合C语言与汇编代码的正确性验证。