In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies often fall short due to the taxonomies and benchmarks only covering a coarse and potentially outdated set of vulnerability types, which leads to evaluations that are not entirely comprehensive and may display bias. In this paper, we fill this gap by proposing an up-to-date and fine-grained taxonomy that includes 45 unique vulnerability types for smart contracts. Taking it as a baseline, we develop an extensive benchmark that covers 40 distinct types and includes a diverse range of code characteristics, vulnerability patterns, and application scenarios. Based on them, we evaluated 8 SAST tools using this benchmark, which comprises 788 smart contract files and 10,394 vulnerabilities. Our results reveal that the existing SAST tools fail to detect around 50% of vulnerabilities in our benchmark and suffer from high false positives, with precision not surpassing 10%. We also discover that by combining the results of multiple tools, the false negative rate can be reduced effectively, at the expense of flagging 36.77 percentage points more functions. Nevertheless, many vulnerabilities, especially those beyond Access Control and Reentrancy vulnerabilities, remain undetected. We finally highlight the valuable insights from our study, hoping to provide guidance on tool development, enhancement, evaluation, and selection for developers, researchers, and practitioners.
翻译:近年来,针对智能合约的攻击事件日益增多,凸显了智能合约安全的重要性。为解决这一问题,大量静态应用安全测试(SAST)工具被提出,用于检测智能合约中的漏洞。然而,客观比较这些工具以评估其有效性仍具挑战性。现有研究常因分类体系和基准仅涵盖粗略且可能过时的漏洞类型而存在不足,导致评估不够全面且可能存在偏差。本文通过提出一个包含45种独特智能合约漏洞类型的、最新且细粒度的分类体系,填补了这一空白。以此为基础,我们构建了一个涵盖40种不同类型、包含多样化代码特征、漏洞模式和应用场景的广泛基准。基于此,我们使用该基准评估了8款SAST工具,该基准包含788个智能合约文件及10,394个漏洞。我们的结果表明,现有SAST工具未能检测出基准中约50%的漏洞,且误报率高,精确率未超过10%。我们还发现,通过结合多款工具的检测结果,可有效降低漏报率,但代价是需额外标记36.77%的函数。尽管如此,许多漏洞——尤其是超出访问控制和重入漏洞范畴的类型——仍未被检测到。最后,我们强调了本研究的重要见解,希望能为开发者、研究人员和实践者在工具开发、改进、评估与选择方面提供指导。