The Smart Contract Weakness Classification Registry (SWC Registry) is a widely recognized list of smart contract weaknesses specific to the Ethereum platform. In recent years, significant research efforts have been dedicated to building tools to detect SWC weaknesses. However, evaluating these tools has proven challenging due to the absence of a large, unbiased, real-world dataset. To address this issue, we recruited 22 participants and spent 44 person-months analyzing 1,322 open-source audit reports from 30 security teams. In total, we identified 10,016 weaknesses and developed two distinct datasets, i.e., DAppSCAN-Source and DAppSCAN-Bytecode. The DAppSCAN-Source dataset comprises 25,077 Solidity files, featuring 1,689 SWC vulnerabilities sourced from 1,139 real-world DApp projects. The Solidity files in this dataset may not be directly compilable. To enable the dataset to be compilable, we developed a tool capable of automatically identifying dependency relationships within DApps and completing missing public libraries. By utilizing this tool, we created our DAPPSCAN-Bytecode dataset, which consists of 8,167 compiled smart contract bytecode with 895 SWC weaknesses. Based on the second dataset, we conducted an empirical study to assess the performance of five state-of-the-art smart contract vulnerability detection tools. The evaluation results revealed subpar performance for these tools in terms of both effectiveness and success detection rate, indicating that future development should prioritize real-world datasets over simplistic toy contracts.
翻译:智能合约弱点分类注册表(SWC Registry)是专门针对以太坊平台的智能合约弱点权威清单。近年来,大量研究工作致力于构建SWC弱点检测工具,然而由于缺乏大规模、无偏见的真实场景数据集,对这些工具的评估面临严峻挑战。为解决该问题,我们招募22名参与者,历时44人月分析了来自30个安全团队的1,322份开源审计报告,最终识别出10,016个弱点并构建了两类独立数据集:DAppSCAN-Source与DAppSCAN-Bytecode。DAppSCAN-Source数据集包含25,077份Solidity文件,涵盖来自1,139个真实DApp项目的1,689个SWC漏洞。该数据集中的Solidity文件可能无法直接编译。为实现数据集的可编译性,我们开发了能自动识别DApp内部依赖关系并补全缺失公共库的工具。利用该工具构建的DAPPSCAN-Bytecode数据集包含8,167个已编译的智能合约字节码及895个SWC弱点。基于第二个数据集,我们开展实证研究评估五种前沿智能合约漏洞检测工具的性能。评估结果显示这些工具在有效性和成功检测率方面表现欠佳,表明未来开发应优先关注真实场景数据集而非简单玩具合约。