Millions of smart contracts have been deployed onto Ethereum for providing various services, whose functions can be invoked. For this purpose, the caller needs to know the function signature of a callee, which includes its function id and parameter types. Such signatures are critical to many applications focusing on smart contracts, e.g., reverse engineering, fuzzing, attack detection, and profiling. Unfortunately, it is challenging to recover the function signatures from contract bytecode, since neither debug information nor type information is present in the bytecode. To address this issue, prior approaches rely on source code, or a collection of known signatures from incomplete databases or incomplete heuristic rules, which, however, are far from adequate and cannot cope with the rapid growth of new contracts. In this paper, we propose a novel solution that leverages how functions are handled by Ethereum virtual machine (EVM) to automatically recover function signatures. In particular, we exploit how smart contracts determine the functions to be invoked to locate and extract function ids, and propose a new approach named type-aware symbolic execution (TASE) that utilizes the semantics of EVM operations on parameters to identify the number and the types of parameters. Moreover, we develop SigRec, a new tool for recovering function signatures from contract bytecode without the need of source code and function signature databases. The extensive experimental results show that SigRec outperforms all existing tools, achieving an unprecedented 98.7 percent accuracy within 0.074 seconds. We further demonstrate that the recovered function signatures are useful in attack detection, fuzzing and reverse engineering of EVM bytecode.
翻译:摘要:数以百万计的智能合约已部署到以太坊上以提供各种服务,其功能可被调用。为此,调用者需要知晓被调用者的函数签名,包含函数ID和参数类型。此类签名对于众多聚焦智能合约的应用(如逆向工程、模糊测试、攻击检测和性能分析)至关重要。然而,从合约字节码中恢复函数签名颇具挑战性,因为字节码中既无调试信息也无类型信息。为解决该问题,先前方法依赖源代码、从不完整数据库或启发式规则中收集已知签名,但这些方法远不够充分且无法应对新合约的快速增长。本文提出一种新颖解决方案,利用以太坊虚拟机(EVM)处理函数的方式来自动恢复函数签名。具体而言,我们利用智能合约确定待调用函数的机制来定位并提取函数ID,并提出一种名为类型感知符号执行(TASE)的新方法,通过利用EVM对参数操作的语义来识别参数的数量和类型。此外,我们开发了SigRec这一新工具,无需源代码和函数签名数据库即可从合约字节码中恢复函数签名。广泛的实验结果表明,SigRec在所有现有工具中表现最优,在0.074秒内实现了前所未有的98.7%准确率。我们进一步证明,恢复出的函数签名在EVM字节码的攻击检测、模糊测试和逆向工程中具有实用价值。