While smart contracts are foundational elements of blockchain applications, their inherent susceptibility to security vulnerabilities poses a significant challenge. Existing training datasets employed for vulnerability detection tools may be limited, potentially compromising their efficacy. This paper presents a method for improving the quantity and quality of smart contract vulnerability datasets and evaluates current detection methods. The approach centers around semantic-preserving code transformation, a technique that modifies the source code structure without altering its semantic meaning. The transformed code snippets are inserted into all potential locations within benign smart contract code, creating new vulnerable contract versions. This method aims to generate a wider variety of vulnerable codes, including those that can bypass detection by current analysis tools. The paper experiments evaluate the method's effectiveness using tools like Slither, Mythril, and CrossFuzz, focusing on metrics like the number of generated vulnerable samples and the false negative rate in detecting these vulnerabilities. The improved results show that many newly created vulnerabilities can bypass tools and the false reporting rate goes up to 100% and increases dataset size minimum by 2.5X.
翻译:智能合约作为区块链应用的基础组件,其固有的安全漏洞敏感性构成了重大挑战。现有漏洞检测工具所使用的训练数据集可能存在局限性,从而影响其检测效能。本文提出一种改进智能合约漏洞数据集数量与质量的方法,并对当前检测方法进行评估。该方法以语义保持的代码转换技术为核心,该技术可在不改变源代码语义的前提下修改其结构。将转换后的代码片段插入良性智能合约代码的所有潜在位置,从而创建新的漏洞合约版本。此方法旨在生成更多样化的漏洞代码,包括能够规避当前分析工具检测的类型。本文通过Slither、Mythril和CrossFuzz等工具进行实验评估,重点关注生成漏洞样本数量及漏洞检测的漏报率等指标。改进结果表明,许多新创建的漏洞能够规避工具检测,误报率最高可达100%,同时数据集规模至少扩大2.5倍。