The introduction of smart contract functionality marks the advent of the blockchain 2.0 era, enabling blockchain technology to support digital currency transactions and complex distributed applications. However, many smart contracts have been found to contain vulnerabilities and errors, leading to the loss of assets within the blockchain. Despite a range of tools that have been developed to identify vulnerabilities in smart contracts at the source code or bytecode level, most rely on a single modality, reducing performance, accuracy, and limited generalization capabilities. This paper proposes a multimodal deep learning approach, MultiCFV, which is designed specifically to analyze and detect erroneous control flow vulnerability, as well as identify code clones in smart contracts. Bytecode is generated from source code to construct control flow graphs, with graph embedding techniques extracting graph features. Abstract syntax trees are used to obtain syntax features, while code comments capture key commentary words and comment features. These three feature vectors are fused to create a database for code inspection, which is used to detect similar code and identify contract vulnerabilities. Experimental results demonstrate our method effectively combines structural, syntactic, and semantic information, improving the accuracy of smart contract vulnerability detection and clone detection.
翻译:智能合约功能的引入标志着区块链2.0时代的到来,使区块链技术能够支持数字货币交易和复杂的分布式应用。然而,许多智能合约被发现存在漏洞和错误,导致区块链内资产损失。尽管已开发出一系列工具用于在源代码或字节码层面识别智能合约漏洞,但大多数工具依赖单一模态,导致性能降低、准确性不足且泛化能力有限。本文提出一种多模态深度学习方法MultiCFV,专门用于分析和检测智能合约中的错误控制流漏洞以及识别代码克隆。该方法从源代码生成字节码以构建控制流图,并利用图嵌入技术提取图特征;使用抽象语法树获取语法特征;同时通过代码注释捕获关键注释词和注释特征。这三种特征向量被融合以构建用于代码检查的数据库,进而检测相似代码并识别合约漏洞。实验结果表明,我们的方法有效结合了结构、句法和语义信息,提高了智能合约漏洞检测和克隆检测的准确性。