Rug pull is a critical attack in the world of blockchain technology. Despite this, the absence of sufficient time-bound and well-structured datasets is considered one of the significant issues faced while identifying early detection. Existing datasets do not provide the solution to this challenge because of temporal leakage or use of post-collapse indicators, insufficient modality coverage, and confusing or partial labels, especially with regards to DeFi tokens. To solve these problems, we present a highly curated and strictly time-bound dataset called TM-RugPull containing 1,000 projects, which include DeFi, meme, NFT, and celebrity token projects. We achieve temporal validation of the dataset by acquiring all three modalities, namely on-chain behavior, smart contract metadata, and OSINT signals. The project labels are provided based on manual investigation for the entire project's lifespan and its collapse. Also, we make our dataset publicly available together with its codebase for data acquisition and feature extraction.
翻译:RugPull 是区块链技术领域中的一种关键攻击手段。然而,用于早期检测的、具备充分时间约束且结构良好的数据集的缺失,被认为是当前面临的主要问题之一。现有数据集因存在时间泄露、使用了事后指标、模态覆盖不足以及标签混乱或不完整(尤其是在 DeFi 代币方面)等问题,无法有效解决这一挑战。为解决这些问题,我们提出了一个高度精选且严格受时间约束的数据集 TM-RugPull,该数据集包含 1000 个项目,涵盖 DeFi、Meme、NFT 及名人代币项目。我们通过获取链上行为、智能合约元数据及开源情报信号(OSINT)这三种模态来实现数据集的时序验证。项目标签基于对整个项目生命周期及其崩盘过程的人工调查得出。此外,我们公开了该数据集及其用于数据采集和特征提取的代码库。