Quality assurance (QA) tools are receiving more and more attention and are widely used by developers. Given the wide range of solutions for QA technology, it is still a question of evaluating QA tools. Most existing research is limited in the following ways: (i) They compare tools without considering scanning rules analysis. (ii) They disagree on the effectiveness of tools due to the study methodology and benchmark dataset. (iii) They do not separately analyze the role of the warnings. (iv) There is no large-scale study on the analysis of time performance. To address these problems, in the paper, we systematically select 6 free or open-source tools for a comprehensive study from a list of 148 existing Java QA tools. To carry out a comprehensive study and evaluate tools in multi-level dimensions, we first mapped the scanning rules to the CWE and analyze the coverage and granularity of the scanning rules. Then we conducted an experiment on 5 benchmarks, including 1,425 bugs, to investigate the effectiveness of these tools. Furthermore, we took substantial effort to investigate the effectiveness of warnings by comparing the real labeled bugs with the warnings and investigating their role in bug detection. Finally, we assessed these tools' time performance on 1,049 projects. The useful findings based on our comprehensive study can help developers improve their tools and provide users with suggestions for selecting QA tools.
翻译:质量保证(QA)工具日益受到关注并被开发者广泛使用。鉴于QA技术解决方案的多样性,如何评估QA工具仍是一个问题。现有研究普遍存在以下局限:(i)工具比较未考虑扫描规则分析;(ii)由于研究方法和基准数据集差异,对工具有效性的结论存在分歧;(iii)未单独分析警告信息的作用;(iv)缺乏大规模时间性能分析研究。针对这些问题,本文从148个已有Java QA工具中系统筛选出6个免费/开源工具进行综合研究。为实现多维度综合评估,我们首先将扫描规则映射至CWE分类体系,分析其覆盖范围与粒度;随后在包含1,425个缺陷的5个基准数据集上开展实验,验证工具有效性;进一步通过对比真实标注缺陷与警告信息,深入探究警告在缺陷检测中的实际作用;最后对1,049个项目的工具时间性能进行评测。本综合研究得出的有益发现既有助于开发者改进工具,也为用户选择QA工具提供了参考建议。