Reverse engineering tools remain monolithic and imperative compared to the advancement of modern compiler architectures: analyses are tied to a single mutable representation, making them difficult to extend or refine, and forcing premature choices between soundness and precision. We observe that decompilation is the reverse of compilation and can be structured as a sequence of modular passes, each performing a granular and clearly defined interpretation of the binary at a progressively higher level of abstraction. We formalize this as provenance-guided superset decompilation (PGSD), a framework that monotonically derives facts about the binary into a relation store. Instead of committing early to a single interpretation, the pipeline retains ambiguous interpretations as parallel candidates with provenance, deferring resolution until the final selection phase. Manifold implements PGSD as a declarative reverse engineering framework that lifts Linux ELF binaries to C99 through a granular intermediate representation in ~35K lines of Rust and Datalog. On GNU coreutils, Manifold's output quality matches Ghidra, IDA Pro, angr, and RetDec on multiple metrics while producing fewer compiler errors, and generalizes across compilers and optimization levels.
翻译:逆向工程工具相较于现代编译器架构的进步仍显单一且命令式:其分析过程绑定于单一可变表示,难以扩展或精炼,并迫使开发者在准确性与精确性之间过早做出取舍。我们观察到反编译是编译的逆向过程,可以构建为一系列模块化传递,每个传递对二进制文件执行颗粒度明确、层次渐高的解释。我们将其形式化为基于溯源的超集反编译(PGSD),该框架单调地将关于二进制文件的事实推导至关系存储中。该管道不会过早承诺单一解释,而是以并列候选方案的形式保留歧义解释及其溯源,直至最终选择阶段才解决歧义。Manifold 将 PGSD 实现为一个声明式逆向工程框架,通过约 35,000 行 Rust 和 Datalog 代码中的颗粒化中间表示,将 Linux ELF 二进制文件提升为 C99 代码。在 GNU coreutils 上,Manifold 的输出质量在多个指标上与 Ghidra、IDA Pro、angr 和 RetDec 相当,同时产生更少的编译错误,并且能够跨编译器和优化级别通用。