Predicting the number of defects in a project is critical for project test managers to allocate budget, resources, and schedule for testing, support and maintenance efforts. Software Defect Prediction models predict the number of defects in given projects after training the model with historical defect related information. The majority of defect prediction studies focused on predicting defect-prone modules from methods, and class-level static information, whereas this study predicts defects from project-level information based on a cross-company project dataset. This study utilizes software sizing metrics, effort metrics, and defect density information, and focuses on developing defect prediction models that apply various machine learning algorithms. One notable issue in existing defect prediction studies is the lack of transparency in the developed models. Consequently, the explain-ability of the developed model has been demonstrated using the state-of-the-art post-hoc model-agnostic method called Shapley Additive exPlanations (SHAP). Finally, important features for predicting defects from cross-company project information were identified.
翻译:预测项目中的缺陷数量对于项目测试经理合理分配测试、支持与维护工作的预算、资源及进度至关重要。软件缺陷预测模型在利用历史缺陷相关信息训练后,可预测指定项目中的缺陷数量。现有缺陷预测研究多聚焦于基于方法和类级别静态信息预测易缺陷模块,而本研究则基于跨公司项目数据集,从项目级别信息出发进行缺陷预测。本研究使用了软件规模度量、工作量度量及缺陷密度信息,并重点开发了应用多种机器学习算法的缺陷预测模型。现有缺陷预测研究中一个显著问题是所构建模型缺乏透明度。为此,本研究采用最先进的模型无关事后解释方法——Shapley加法解释(SHAP)展示了模型的可解释性。最后,研究确定了基于跨公司项目信息预测缺陷的关键特征。