SourceP: Detecting Ponzi Schemes on Ethereum with Source Code

As blockchain technology becomes more and more popular, a typical financial scam, the Ponzi scheme, has also emerged in the blockchain platform Ethereum. This Ponzi scheme deployed through smart contracts, also known as the smart Ponzi scheme, has caused a lot of economic losses and negative impacts. Existing methods for detecting smart Ponzi schemes on Ethereum mainly rely on bytecode features, opcode features, account features, and transaction behavior features of smart contracts, and the performance of identifying schemes is insufficient. In this paper, we propose SourceP, a method to detect smart Ponzi schemes on the Ethereum platform using pre-trained models and data flow, which only requires using the source code of smart contracts as features to explore the possibility of detecting smart Ponzi schemes from another direction. SourceP reduces the difficulty of data acquisition and feature extraction of existing detection methods while increasing the interpretability of the model. Specifically, we first convert the source code of a smart contract into a data flow graph and then introduce a pre-trained model based on learning code representations to build a classification model to identify Ponzi schemes in smart contracts. The experimental results show that SourceP achieves 87.2\% recall and 90.7\% F-score for detecting smart Ponzi schemes within Ethereum's smart contract dataset, outperforming state-of-the-art methods in terms of performance and sustainability. We also demonstrate through additional experiments that pre-trained models and data flow play an important contribution to SourceP, as well as proving that SourceP has a good generalization ability.

翻译：随着区块链技术日益普及，一种典型的金融骗局——庞氏骗局——也出现在区块链平台以太坊上。这种通过智能合约部署的庞氏骗局（又称智能庞氏骗局）已造成大量经济损失和负面影响。现有检测以太坊上智能庞氏骗局的方法主要依赖智能合约的字节码特征、操作码特征、账户特征和交易行为特征，但识别性能不足。本文提出SourceP，一种利用预训练模型和数据流检测以太坊平台智能庞氏骗局的方法，该方法仅需使用智能合约的源代码作为特征，从另一角度探索检测智能庞氏骗局的可能性。SourceP降低了现有检测方法的数据获取和特征提取难度，同时提升了模型的可解释性。具体而言，我们首先将智能合约源代码转换为数据流图，然后引入基于代码表征学习的预训练模型，构建分类模型以识别智能合约中的庞氏骗局。实验结果表明，SourceP在以太坊智能合约数据集中检测智能庞氏骗局时，召回率达87.2%，F值达90.7%，在性能和可持续性方面均优于现有最优方法。我们通过额外实验进一步证明了预训练模型和数据流对SourceP的重要贡献，并验证了SourceP具有良好的泛化能力。