Recently, smart contracts have played a vital role in automatic financial and business transactions. To help end users without programming background to better understand the logic of smart contracts, previous studies have proposed models for automatically translating smart contract source code into their corresponding code summaries. However, in practice, only 13% of smart contracts deployed on the Ethereum blockchain are associated with source code. The practical usage of these existing tools is significantly restricted. Considering that bytecode is always necessary when deploying smart contracts, in this paper, we first introduce the task of automatically generating smart contract code summaries from bytecode. We propose a novel approach, named SmartBT (Smart contract Bytecode Translator) for automatically translating smart contract bytecode into fine-grained natural language description directly. Two key challenges are posed for this task: structural code logic hidden in bytecode and the huge semantic gap between bytecode and natural language descriptions. To address the first challenge, we transform bytecode into CFG (Control-Flow Graph) to learn code structural and logic details. Regarding the second challenge, we introduce an information retrieval component to fetch similar comments for filling the semantic gap. Then the structural input and semantic input are used to build an attentional sequence-to-sequence neural network model. The copy mechanism is employed to copy rare words directly from similar comments and the coverage mechanism is employed to eliminate repetitive outputs. The automatic evaluation results show that SmartBT outperforms a set of baselines by a large margin, and the human evaluation results show the effectiveness and potential of SmartBT in producing meaningful and accurate comments for smart contract code from bytecode directly.
翻译:近年来,智能合约在自动化金融与商业交易中发挥着至关重要的作用。为帮助不具备编程背景的终端用户更好地理解智能合约逻辑,先前研究已提出多种将智能合约源代码自动翻译为对应代码摘要的模型。然而,实践中以太坊区块链上部署的智能合约仅有13%附带源代码,这严重限制了现有工具的实际应用。考虑到部署智能合约时字节码始终是必需的,本文首次提出从字节码自动生成智能合约代码摘要的任务。我们提出一种名为SmartBT(智能合约字节码翻译器)的创新方法,能够直接将智能合约字节码转换为细粒度的自然语言描述。该任务面临两大关键挑战:字节码中隐含的结构化代码逻辑,以及字节码与自然语言描述间巨大的语义鸿沟。针对第一项挑战,我们将字节码转换为控制流图以学习代码结构与逻辑细节。针对第二项挑战,我们引入信息检索组件获取相似注释以填补语义鸿沟。随后结合结构输入与语义输入构建注意力序列到序列神经网络模型,采用复制机制直接从相似注释中复制罕见词汇,并运用覆盖机制消除重复输出。自动评估结果表明SmartBT显著优于一系列基线模型,人工评估结果则验证了SmartBT在直接基于字节码为智能合约代码生成有意义且准确注释方面的有效性与潜力。