Quality assurance (QA) tools are receiving more and more attention and are widely used by developers. Given the wide range of solutions for QA technology, it is still a question of evaluating QA tools. Most existing research is limited in the following ways: (i) They compare tools without considering scanning rules analysis. (ii) They disagree on the effectiveness of tools due to the study methodology and benchmark dataset. (iii) They do not separately analyze the role of the warnings. (iv) There is no large-scale study on the analysis of time performance. To address these problems, in the paper, we systematically select 6 free or open-source tools for a comprehensive study from a list of 148 existing Java QA tools. To carry out a comprehensive study and evaluate tools in multi-level dimensions, we first mapped the scanning rules to the CWE and analyze the coverage and granularity of the scanning rules. Then we conducted an experiment on 5 benchmarks, including 1,425 bugs, to investigate the effectiveness of these tools. Furthermore, we took substantial effort to investigate the effectiveness of warnings by comparing the real labeled bugs with the warnings and investigating their role in bug detection. Finally, we assessed these tools' time performance on 1,049 projects. The useful findings based on our comprehensive study can help developers improve their tools and provide users with suggestions for selecting QA tools.
翻译:质量保证(QA)工具日益受到关注并被开发者广泛使用。尽管QA技术解决方案种类繁多,但评估QA工具仍存疑问。现有研究普遍存在以下局限:(i)工具比较未考虑扫描规则分析;(ii)因研究方法和基准数据集差异,对工具有效性存在分歧;(iii)未单独分析告警所起的作用;(iv)缺乏大规模时间性能分析研究。针对上述问题,本文从148个现有Java QA工具中系统筛选出6个免费或开源工具进行综合研究。为实现多维度综合评估,我们首先将扫描规则映射至CWE标准,分析扫描规则的覆盖度与粒度;随后在包含1,425个缺陷的5个基准数据集上开展实验,检验工具有效性;同时投入大量工作对比真实标注缺陷与告警信息,探究告警在缺陷检测中的作用;最终在1,049个项目中评估工具的时间性能。本综合研究得出的有效发现可帮助开发者改进工具,并为用户选择QA工具提供建议。