Malicious developer intents in smart contracts constitute significant security threats to decentralized applications, leading to substantial economic losses. Prior work introduced SmartIntentNN, a deep learning model for detecting unsafe developer intents. By combining the Universal Sentence Encoder, a K-means clustering-based intent highlighting mechanism, and a Bidirectional Long Short-Term Memory (BiLSTM) network, the model achieved an F1 score of 0.8633 on an evaluation set of 10,000 real-world smart contracts across ten distinct intent categories. This paper presents SmartIntentV2 (Smart Contract Intent Neural Network Version 2). The primary enhancement is the integration of a BERT-based pre-trained programming language model, which we domain-adaptively pre-train on a dataset of 16,000 real-world smart contracts using a Masked Language Modeling objective. SmartIntentV2 retains the BiLSTM-based multi-label classification network for intent detection. On the same evaluation set of 10,000 smart contracts, it achieves superior performance with an accuracy of 0.9789, precision of 0.9090, recall of 0.9476, and an F1 score of 0.9279, substantially outperforming its predecessor and other baseline models. Notably, SmartIntentV2 also delivers a 65.5% relative improvement in F1 score over GPT-4.1 on this specialized task. These results establish SmartIntentV2 as a new state-of-the-art model for smart contract intent detection.
翻译:智能合约中的恶意开发者意图对去中心化应用构成重大安全威胁,导致巨额经济损失。先前的研究提出了SmartIntentNN,一种用于检测不安全开发者意图的深度学习模型。通过结合通用句子编码器(Universal Sentence Encoder)、基于K-means聚类的意图高亮机制以及双向长短期记忆(BiLSTM)网络,该模型在包含10,000个真实世界智能合约、涵盖十个不同意图类别的评估数据集上取得了0.8633的F1分数。本文介绍了SmartIntentV2(智能合约意图神经网络版本2)。主要改进在于集成了基于BERT的预训练编程语言模型,我们通过在包含16,000个真实世界智能合约的数据集上采用掩码语言建模(Masked Language Modeling)目标进行领域自适应预训练。SmartIntentV2保留了基于BiLSTM的多标签分类网络用于意图检测。在相同的10,000个智能合约评估数据集上,该模型实现了更优的性能:准确率0.9789,精确率0.9090,召回率0.9476,F1分数0.9279,显著优于其前身及其他基线模型。值得注意的是,在此专项任务中,SmartIntentV2的F1分数相比GPT-4.1实现了65.5%的相对提升。这些结果确立了SmartIntentV2作为智能合约意图检测领域新标杆模型的地位。