SourceP: Smart Ponzi Schemes Detection on Ethereum Using Pre-training Model with Data Flow

As blockchain technology becomes more and more popular, a typical financial scam, the Ponzi scheme, has also emerged in the blockchain platform Ethereum. This Ponzi scheme deployed through smart contracts, also known as the smart Ponzi scheme, has caused a lot of economic losses and negative impacts. Existing methods for detecting smart Ponzi schemes on Ethereum mainly rely on bytecode features, opcode features, account features, and transaction behavior features of smart contracts, and such methods lack interpretability and sustainability. In this paper, we propose SourceP, a method to detect smart Ponzi schemes on the Ethereum platform using pre-training models and data flow, which only requires using the source code of smart contracts as features to explore the possibility of detecting smart Ponzi schemes from another direction. SourceP reduces the difficulty of data acquisition and feature extraction of existing detection methods while increasing the interpretability of the model. Specifically, we first convert the source code of a smart contract into a data flow graph and then introduce a pre-training model based on learning code representations to build a classification model to identify Ponzi schemes in smart contracts. The experimental results show that SourceP achieves 87.2\% recall and 90.7\% F-score for detecting smart Ponzi schemes within Ethereum's smart contract dataset, outperforming state-of-the-art methods in terms of performance and sustainability. We also demonstrate through additional experiments that pre-training models and data flow play an important contribution to SourceP, as well as proving that SourceP has a good generalization ability.

翻译：随着区块链技术的日益普及，一种典型的金融骗局——庞氏骗局，也在区块链平台以太坊上出现。这种通过智能合约部署的庞氏骗局（即智能庞氏骗局）已造成了巨大的经济损失和负面影响。现有检测以太坊上智能庞氏骗局的方法主要依赖智能合约的字节码特征、操作码特征、账户特征及交易行为特征，此类方法缺乏可解释性与可持续性。本文提出SourceP——一种利用预训练模型与数据流检测以太坊平台智能庞氏骗局的方法，该方法仅需使用智能合约源代码作为特征，从另一方向探索检测智能庞氏骗局的可能性。SourceP在降低现有检测方法数据获取与特征提取难度的同时，增强了模型的可解释性。具体而言，我们首先将智能合约源代码转换为数据流图，随后引入基于代码表示学习的预训练模型，构建分类模型以识别智能合约中的庞氏骗局。实验结果表明，SourceP在以太坊智能合约数据集中检测智能庞氏骗局的召回率达到87.2%，F值达到90.7%，在性能与可持续性方面均优于现有最优方法。通过额外实验，我们进一步证明预训练模型与数据流对SourceP的重要贡献，并证实SourceP具有良好的泛化能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日