Context: Static Application Security Testing Tools (SASTTs) identify software vulnerabilities to support the security and reliability of software applications. Interestingly, several studies have suggested that alternative solutions may be more effective than SASTTs due to their tendency to generate false alarms, commonly referred to as low Precision. Aim: We aim to comprehensively evaluate SASTTs, setting a reliable benchmark for assessing and finding gaps in vulnerability identification mechanisms based on SASTTs or alternatives. Method: Our SASTTs evaluation is based on a controlled, though synthetic, Java codebase. It involves an assessment of 1.5 million test executions, and it features innovative methodological features such as effort-aware accuracy metrics and method-level analysis. Results: Our findings reveal that SASTTs detect a tiny range of vulnerabilities. In contrast to prevailing wisdom, SASTTs exhibit high Precision while falling short in Recall. Conclusions: The paper suggests that enhancing Recall, alongside expanding the spectrum of detected vulnerability types, should be the primary focus for improving SASTTs or alternative approaches, such as machine learning-based vulnerability identification solutions.
翻译:背景:静态应用安全测试工具(SASTTs)通过识别软件漏洞来保障软件应用的安全性与可靠性。有趣的是,多项研究指出,由于这类工具容易产生误报(即通常所说的低精确率),替代方案可能比SASTTs更有效。目标:本研究旨在对SASTTs进行全面评估,建立可靠的基准测试体系,以评估基于SASTTs或其替代方案的漏洞识别机制并发现其不足。方法:我们在受控(虽为合成环境)的Java代码库基础上进行SASTTs评估。该评估涉及150万次测试执行,并引入了创新性方法论特征,包括考虑工作量的准确率度量和方法级分析。结果:研究发现SASTTs仅能检测到极小范围的漏洞。与普遍认知相反,SASTTs展现出高精确率,但在召回率方面表现不足。结论:本文认为,提升召回率并扩展可检测漏洞类型的范围,应成为改进SASTTs或替代方案(如基于机器学习的漏洞识别解决方案)的首要任务。