With the booming development of blockchain technology, smart contracts have been widely used in finance, supply chain, Internet of things and other fields in recent years. However, the security problems of smart contracts become increasingly prominent. Security events caused by smart contracts occur frequently, and the existence of malicious codes may lead to the loss of user assets and system crash. In this paper, a simple study is carried out on malicious code detection of intelligent contracts based on machine learning. The main research work and achievements are as follows: Feature extraction and vectorization of smart contract are the first step to detect malicious code of smart contract by using machine learning method, and feature processing has an important impact on detection results. In this paper, an opcode vectorization method based on smart contract text is adopted. Based on considering the structural characteristics of contract opcodes, the opcodes are classified and simplified. Then, N-Gram (N=2) algorithm and TF-IDF algorithm are used to convert the simplified opcodes into vectors, and then put into the machine learning model for training. In contrast, N-Gram algorithm and TF-IDF algorithm are directly used to quantify opcodes and put into the machine learning model training. Judging which feature extraction method is better according to the training results. Finally, the classifier chain is applied to the intelligent contract malicious code detection.
翻译:随着区块链技术的蓬勃发展,近年来智能合约在金融、供应链、物联网等领域得到了广泛应用。然而,智能合约的安全问题日益突出,由其引发的安全事件频发,恶意代码的存在可能导致用户资产损失与系统崩溃。本文基于机器学习对智能合约恶意代码检测进行了初步研究。主要研究工作与成果如下:智能合约的特征提取与向量化是利用机器学习方法检测其恶意代码的首要步骤,特征处理对检测结果具有重要影响。本文采用了一种基于智能合约文本的操作码向量化方法。在考虑合约操作码结构特征的基础上,对操作码进行分类与简化。随后,利用N-Gram(N=2)算法与TF-IDF算法将简化后的操作码转化为向量,再输入机器学习模型进行训练。作为对比,直接使用N-Gram算法与TF-IDF算法对操作码进行量化并投入机器学习模型训练。根据训练结果判断何种特征提取方法更优。最后,将分类器链应用于智能合约恶意代码检测。