Macaw: A Machine Code Toolbox for the Busy Binary Analyst

Ryan G. Scott,Brett Boston,Benjamin Davis,Iavor Diatchki,Mike Dodds,Joe Hendrix,Daniel Matichuk,Kevin Quick,Tristan Ravitch,Valentin Robert,Benjamin Selfridge,Andrei Stefănescu,Daniel Wagner,Simon Winwood

When attempting to understand the behavior of an executable, a binary analyst can make use of many different techniques. These include program slicing, dynamic instrumentation, binary-level rewriting, symbolic execution, and formal verification, all of which can uncover insights into how a piece of machine code behaves. As a result, there is no one-size-fits-all binary analysis tool, so a binary analysis researcher will often combine several different tools. Sometimes, a researcher will even need to design new tools to study problems that existing frameworks are not well equipped to handle. Designing such tools from complete scratch is rarely time- or cost-effective, however, given the scale and complexity of modern instruction set architectures. We present Macaw, a modular framework that makes it possible to rapidly build reliable binary analysis tools across a range of use cases. Over a decade of development, we have used Macaw to support an industrial research team in building tools for machine code-related tasks. As such, the name "Macaw" refers not just to the framework itself, but also a suite of tools that are built on top of the framework. We describe Macaw in depth and describe the different static and dynamic analyses that it performs, many of which are powered by an SMT-based symbolic execution engine. We put a particular focus on interoperability between machine code and higher-level languages, including binary lifting from x86 to LLVM, as well verifying the correctness of mixed C and assembly code.

翻译：在尝试理解可执行文件的行为时，二进制分析师可采用多种不同技术。这些技术包括程序切片、动态插桩、二进制级重写、符号执行和形式化验证，所有这些方法都能揭示机器代码的行为方式。因此，不存在适用于所有场景的二进制分析工具，二进制分析研究人员通常需要组合多种不同工具。有时，研究人员甚至需要设计新工具来研究现有框架无法妥善处理的问题。然而，考虑到现代指令集架构的规模和复杂性，完全从零开始设计此类工具在时间或成本上往往不具效益。我们提出Macaw——一个模块化框架，能够支持快速构建适用于多种使用场景的可靠二进制分析工具。经过十余年的开发，我们已运用Macaw支持工业研究团队构建用于机器代码相关任务的工具。因此，“Macaw”不仅指代框架本身，也包含构建于该框架之上的一系列工具。我们将深入阐述Macaw的设计，并描述其执行的静态与动态分析功能，其中多项功能由基于SMT的符号执行引擎驱动。我们特别关注机器代码与高级语言之间的互操作性，包括从x86到LLVM的二进制提升，以及混合C语言与汇编代码的正确性验证。