Context: Software defect prediction utilizes historical data to direct software quality assurance resources to potentially problematic components. Effort-aware (EA) defect prediction prioritizes more bug-like components by taking cost-effectiveness into account. In other words, it is a ranking problem, however, existing ranking strategies based on classification, give limited consideration to ranking errors. Objective: Improve the performance of classifier-based EA ranking methods by focusing on ranking errors. Method: We propose a ranking score calculation strategy called EA-Z which sets a lower bound to avoid near-zero ranking errors. We investigate four primary EA ranking strategies with 16 classification learners, and conduct the experiments for EA-Z and the other four existing strategies. Results: Experimental results from 72 data sets show EA-Z is the best ranking score calculation strategy in terms of Recall@20% and Popt when considering all 16 learners. For particular learners, imbalanced ensemble learner UBag-svm and UBst-rf achieve top performance with EA-Z. Conclusion: Our study indicates the effectiveness of reducing ranking errors for classifier-based effort-aware defect prediction. We recommend using EA-Z with imbalanced ensemble learning.
翻译:上下文:软件缺陷预测利用历史数据将软件质量保证资源引导至可能存在问题的组件。关注成本的缺陷预测通过考虑成本效益,优先处理更易出错的组件。换言之,这是一个排序问题,然而现有基于分类的排序策略对排序误差的考虑有限。目的:通过聚焦排序误差,提升基于分类器的关注成本排序方法的性能。方法:我们提出一种名为EA-Z的排序得分计算策略,该策略设置下限以避免接近零的排序误差。我们研究了四种主要的关注成本排序策略(结合16种分类学习器),并对EA-Z及其他四种现有策略进行了实验。结果:来自72个数据集的实验结果表明,在考虑全部16种学习器时,EA-Z在Recall@20%和Popt指标上是最优的排序得分计算策略。针对特定学习器,不平衡集成学习器UBag-svm和UBst-rf结合EA-Z取得了最佳性能。结论:本研究证实了减少排序误差对基于分类器的关注成本缺陷预测的有效性。我们建议将EA-Z与不平衡集成学习结合使用。