Smart contracts deployed on blockchain platforms are vulnerable to various security vulnerabilities. However, only a small number of Ethereum contracts have released their source code, so vulnerability detection at the bytecode level is crucial. This paper introduces SmartBugBert, a novel approach that combines BERT-based deep learning with control flow graph (CFG) analysis to detect vulnerabilities directly from bytecode. Our method first decompiles smart contract bytecode into optimized opcode sequences, extracts semantic features using TF-IDF, constructs control flow graphs to capture execution logic, and isolates vulnerable CFG fragments for targeted analysis. By integrating both semantic and structural information through a fine-tuned BERT model and LightGBM classifier, our approach effectively identifies four critical vulnerability types: transaction-ordering, access control, self-destruct, and timestamp dependency vulnerabilities. Experimental evaluation on 6,157 Ethereum smart contracts demonstrates that SmartBugBert achieves 90.62% precision, 91.76% recall, and 91.19% F1-score, significantly outperforming existing detection methods. Ablation studies confirm that the combination of semantic features with CFG information substantially enhances detection performance. Furthermore, our approach maintains efficient detection speed (0.14 seconds per contract), making it practical for large-scale vulnerability assessment.
翻译:部署在区块链平台上的智能合约易受各类安全漏洞影响。然而,仅有少数以太坊合约公开了源代码,因此在字节码层面进行漏洞检测至关重要。本文提出SmartBugBert,一种将基于BERT的深度学习与控制流图分析相结合的新方法,可直接从字节码中检测漏洞。我们的方法首先将智能合约字节码反编译为优化的操作码序列,利用TF-IDF提取语义特征,构建控制流图以捕捉执行逻辑,并隔离易受攻击的CFG片段进行针对性分析。通过微调的BERT模型与LightGBM分类器融合语义与结构信息,本方法能有效识别四种关键漏洞类型:交易顺序依赖、访问控制、自毁操作及时间戳依赖漏洞。在6,157份以太坊智能合约上的实验评估表明,SmartBugBert实现了90.62%的精确率、91.76%的召回率与91.19%的F1分数,显著优于现有检测方法。消融实验证实,语义特征与CFG信息的结合能实质性提升检测性能。此外,本方法保持了高效的检测速度(单合约0.14秒),使其适用于大规模漏洞评估场景。