Rug-pull attacks pose a systemic threat across the blockchain ecosystem, yet research into early detection is hindered by the lack of scientific-grade datasets. Existing resources often suffer from temporal data leakage, narrow modality, and ambiguous labeling, particularly outside DeFi contexts. To address these limitations, we present TM-RugPull, a rigorously curated, leakage-resistant dataset of 1,028 token projects spanning DeFi, meme coins, NFTs, and celebrity-themed tokens. RugPull enforces strict temporal hygiene by extracting all features on chain behavior, smart contract metadata, and OSINT signals strictly from the first half of each project's lifespan. Labels are grounded in forensic reports and longevity criteria, verified through multi-expert consensus. This dataset enables causally valid, multimodal analysis of rug-pull dynamics and establishes a new benchmark for reproducible fraud detection research.
翻译:Rug Pull攻击对区块链生态系统构成系统性威胁,然而早期检测研究因缺乏科学级数据集而受阻。现有资源常存在时间数据泄露、模态单一及标签模糊等问题,尤其在DeFi场景之外。为应对这些限制,我们提出TM-RugPull——一个经过严格筛选、防泄露的数据集,包含覆盖DeFi、Meme币、NFT及名人主题代币的1,028个代币项目。该数据集通过严格限定从每个项目生命周期前半段提取所有特征(包括链上行为、智能合约元数据及OSINT信号)来强制保证时间卫生。标签基于法证报告与长期存续标准,并经多专家共识校验。该数据集支持对Rug Pull动态进行因果有效的多模态分析,并为可重复的欺诈检测研究建立新基准。