A Risk-Stratified Benchmark Dataset for Bad Randomness (SWC-120) Vulnerabilities in Ethereum Smart Contracts

Many Ethereum smart contracts rely on block attributes such as block.timestamp or blockhash to generate random numbers for applications like lotteries and games. However, these values are predictable and miner-manipulable, creating the Bad Randomness vulnerability (SWC-120) that has led to real-world exploits. Current detection tools identify only simple patterns and fail to verify whether protective modifiers actually guard vulnerable code. A major obstacle to improving these tools is the lack of large, accurately labeled datasets. This paper presents a benchmark dataset of 1,752 Ethereum smart contracts with validated Bad Randomness vulnerabilities. We developed a five-phase methodology comprising keyword filtering, pattern matching with 58 regular expressions, risk classification, function-level validation, and context analysis. The function-level validation revealed that 49% of contracts initially classified as protected were actually exploitable because modifiers were applied to different functions than those containing vulnerabilities. We classify contracts into four risk levels based on exploitability: HIGH_RISK (no protection), MEDIUM_RISK (miner-exploitable only), LOW_RISK (owner-exploitable only), and SAFE (using Chainlink VRF or commit-reveal). Our dataset is 51 times larger than RNVulDet and the first to provide function-level validation and risk stratification. Evaluation of Slither and Mythril revealed significant detection gaps, as both tools identified none of the vulnerable contracts in our sample, indicating limitations in handling complex randomness patterns. The dataset and validation scripts are publicly available to support future research in smart contract security.

翻译：众多以太坊智能合约依赖区块属性（如block.timestamp或blockhash）为彩票和游戏等应用生成随机数。然而，这些数值具有可预测性且可被矿工操纵，由此形成的不良随机性漏洞（SWC-120）已在现实场景中引发实际攻击。现有检测工具仅能识别简单模式，且无法验证保护性修饰符是否真正守护了脆弱代码。改进这些工具的主要障碍在于缺乏大规模、精准标注的数据集。本文提出包含1,752个已验证存在不良随机性漏洞的以太坊智能合约基准数据集。我们开发了包含关键词过滤、58个正则表达式的模式匹配、风险分级、函数级验证及上下文分析的五阶段方法。函数级验证显示，49%初始归类为受保护的合约实际仍可被利用，因为修饰符被应用于不包含漏洞的其他函数。我们根据可攻击性将合约分为四个风险等级：高风险（无保护）、中风险（仅矿工可攻击）、低风险（仅合约所有者可攻击）和安全（使用Chainlink VRF或提交-揭示方案）。本数据集规模是RNVulDet的51倍，且首次提供函数级验证与风险分层。对Slither和Mythril的评估揭示了显著的检测缺陷——两种工具均未识别出样本中的任何脆弱合约，表明其在处理复杂随机性模式方面存在局限。本数据集与验证脚本已公开，以支持智能合约安全领域的未来研究。