Detecting DeFi Securities Violations from Token Smart Contract Code

Decentralized Finance (DeFi) is a system of financial products and services built and delivered through smart contracts on various blockchains. In the past year, DeFi has gained popularity and market capitalization. However, it has also been connected to crime, in particular, various types of securities violations. The lack of Know Your Customer requirements in DeFi poses challenges to governments trying to mitigate potential offending in this space. This study aims to uncover whether this problem is suited to a machine learning approach, namely, whether we can identify DeFi projects potentially engaging in securities violations based on their tokens' smart contract code. We adapt prior work on detecting specific types of securities violations across Ethereum, building classifiers based on features extracted from DeFi projects' tokens' smart contract code. The final logistic regression model achieves a 98.9% F-1 score; the final random forest classifier achieves a 98.6% F1-score. From further feature-level analysis, we find a single feature makes this a highly detectable problem. The high reliance on a single feature means that, at this stage, a complex machine learning model may not be necessary or desirable for this problem. However, this may change as DeFi securities violations become more sophisticated. Another contribution of our study is a new dataset, comprised of (a) a verified ground truth dataset for tokens involved in securities violations and (b) a set of legitimate tokens from a reputable DeFi aggregator. This paper further discusses the potential use of a model like ours by prosecutors in enforcement efforts and connects it to the wider legal context.

翻译：去中心化金融（DeFi）是一种通过区块链上智能合约构建和交付的金融产品与服务系统。在过去一年中，DeFi获得了广泛关注并实现了市值增长，但同时也与犯罪活动相关联，尤其是各类证券违规行为。DeFi缺乏"了解你的客户"要求，这给试图减少该领域潜在违规行为的政府带来了挑战。本研究旨在探讨该问题是否适用于机器学习方法——即能否基于代币智能合约代码识别涉嫌证券违规的DeFi项目。我们借鉴了先前在以太坊上检测特定类型证券违规的研究方法，基于从DeFi项目代币智能合约代码中提取的特征构建分类器。最终的逻辑回归模型达到了98.9%的F1分数；随机森林分类器达到了98.6%的F1分数。通过进一步的特征层级分析，我们发现单一特征使该问题具有高度可检测性。对单一特征的高度依赖意味着，现阶段针对该问题可能无需或不宜采用复杂机器学习模型。但随着DeFi证券违规手段日趋复杂，这一情况可能发生变化。本研究的另一贡献是构建了新数据集，包含：(a) 涉及证券违规代币的经验证真实标签数据集；(b) 来自知名DeFi聚合器的合法代币集。本文还进一步讨论了类似模型在执法行动中的潜在应用，并将其与更广泛的司法背景相关联。