Augmenting Smart Contract Decompiler Output through Fine-grained Dependency Analysis and LLM-facilitated Semantic Recovery

Decompiler is a specialized type of reverse engineering tool extensively employed in program analysis tasks, particularly in program comprehension and vulnerability detection. However, current Solidity smart contract decompilers face significant limitations in reconstructing the original source code. In particular, the bottleneck of SOTA decompilers lies in inaccurate method identification, incorrect variable type recovery, and missing contract attributes. These deficiencies hinder downstream tasks and understanding of the program logic. To address these challenges, we propose SmartHalo, a new framework that enhances decompiler output by combining static analysis (SA) and large language models (LLM). SmartHalo leverages the complementary strengths of SA's accuracy in control and data flow analysis and LLM's capability in semantic prediction. More specifically, \system{} constructs a new data structure - Dependency Graph (DG), to extract semantic dependencies via static analysis. Then, it takes DG to create prompts for LLM optimization. Finally, the correctness of LLM outputs is validated through symbolic execution and formal verification. Evaluation on a dataset consisting of 465 randomly selected smart contract methods shows that SmartHalo significantly improves the quality of the decompiled code, compared to SOTA decompilers (e.g., Gigahorse). Notably, integrating GPT-4o with SmartHalo further enhances its performance, achieving precision rates of 87.39% for method boundaries, 90.39% for variable types, and 80.65% for contract attributes.

翻译：反编译器是一种广泛应用于程序分析任务（特别是程序理解和漏洞检测）的专用逆向工程工具。然而，当前Solidity智能合约反编译器在重建原始源代码方面面临显著局限。具体而言，最先进反编译器的瓶颈在于方法识别不准确、变量类型恢复错误以及合约属性缺失。这些缺陷阻碍了下游任务和对程序逻辑的理解。为应对这些挑战，我们提出SmartHalo，一个结合静态分析和大语言模型来增强反编译器输出的新框架。SmartHalo利用静态分析在控制流和数据流分析上的精确性，以及大语言模型在语义预测方面的能力，形成互补优势。具体来说，\system{}构建了一种新的数据结构——依赖图，以通过静态分析提取语义依赖关系。随后，它利用依赖图创建用于大语言模型优化的提示。最后，通过符号执行和形式验证来检验大语言模型输出的正确性。在包含465个随机选取的智能合约方法的数据集上的评估表明，与最先进的反编译器相比，SmartHalo显著提升了反编译代码的质量。值得注意的是，将GPT-4o与SmartHalo集成进一步增强了其性能，在方法边界、变量类型和合约属性上的精确率分别达到87.39%、90.39%和80.65%。