Detecting DeFi Securities Violations from Token Smart Contract Code

Decentralized Finance (DeFi) is a system of financial products and services built and delivered through smart contracts on various blockchains. In the past year, DeFi has gained popularity and market capitalization. However, it has also been connected to crime, in particular, various types of securities violations. The lack of Know Your Customer requirements in DeFi poses challenges to governments trying to mitigate potential offending in this space. This study aims to uncover whether this problem is suited to a machine learning approach, namely, whether we can identify DeFi projects potentially engaging in securities violations based on their tokens' smart contract code. We adapt prior work on detecting specific types of securities violations across Ethereum, building classifiers based on features extracted from DeFi projects' tokens' smart contract code (specifically, opcode-based features). Our final model is a random forest model that achieves an 80\% F-1 score against a baseline of 50\%. Notably, we further explore the code-based features that are most important to our model's performance in more detail, analyzing tokens' Solidity code and conducting cosine similarity analyses. We find that one element of the code our opcode-based features may be capturing is the implementation of the SafeMath library, though this does not account for the entirety of our features. Another contribution of our study is a new data set, comprised of (a) a verified ground truth data set for tokens involved in securities violations and (b) a set of legitimate tokens from a reputable DeFi aggregator. This paper further discusses the potential use of a model like ours by prosecutors in enforcement efforts and connects it to the wider legal context.

翻译：去中心化金融（DeFi）是一种通过不同区块链上的智能合约构建和交付的金融产品与服务系统。过去一年中，DeFi获得了广泛关注并实现了市值增长。然而，它也与犯罪活动相关联，尤其是各类证券违规行为。DeFi缺乏"了解你的客户"（KYC）要求，给政府试图减少该领域潜在违规行为带来了挑战。本研究旨在探究该问题是否适合采用机器学习方法，即我们能否基于代币的智能合约代码识别可能涉及证券违规的DeFi项目。我们借鉴了先前关于检测以太坊上特定类型证券违规行为的研究，基于从DeFi项目代币智能合约代码中提取的特征（具体为操作码特征）构建分类器。最终模型采用随机森林算法，在基准线为50%的情况下达到了80%的F-1分数。值得注意的是，我们进一步详细探究了对模型性能最重要的代码特征，对代币的Solidity代码进行了分析，并开展了余弦相似度分析。我们发现，操作码特征可能捕捉到的代码要素之一是SafeMath库的实现，尽管这并不能完全解释所有特征。本研究的另一贡献是构建了一个新数据集，包含：（a）涉及证券违规代币的经过验证的真实数据集，以及（b）来自知名DeFi聚合器的合法代币集。本文还进一步讨论了类似我们的模型在执法机关执行中的潜在用途，并将其与更广泛的法律背景相联系。