Rug-pull attacks pose a systemic threat across the blockchain ecosystem, yet research into early detection is hindered by the lack of scientific-grade datasets. Existing resources often suffer from temporal data leakage, narrow modality, and ambiguous labeling, particularly outside DeFi contexts. To address these limitations, we present TM-RugPull, a rigorously curated, leakage-resistant dataset of 1,028 token projects spanning DeFi, meme coins, NFTs, and celebrity-themed tokens. RugPull enforces strict temporal hygiene by extracting all features on chain behavior, smart contract metadata, and OSINT signals strictly from the first half of each project's lifespan. Labels are grounded in forensic reports and longevity criteria, verified through multi-expert consensus. This dataset enables causally valid, multimodal analysis of rug-pull dynamics and establishes a new benchmark for reproducible fraud detection research.
翻译:Rug Pull攻击对整个区块链生态系统构成了系统性威胁,然而早期检测的研究因缺乏科学级数据集而受到阻碍。现有资源通常存在时间数据泄露、模态单一和标签模糊等问题,尤其是在去中心化金融(DeFi)以外的场景中。为应对这些限制,我们提出了TM-RugPull,这是一个经过严格筛选、抗泄露的数据集,包含1,028个代币项目,涵盖DeFi、迷因币、非同质化代币(NFT)和名人主题代币。该数据集通过严格从每个项目生命周期的前半段提取链上行为、智能合约元数据和开源情报(OSINT)信号的所有特征,来强制执行严格的时间卫生。标签基于法证报告和存续期标准,并通过多专家共识进行验证。该数据集支持对Rug Pull动态进行因果有效的多模态分析,并为可复现的欺诈检测研究建立了新的基准。