Software defect prediction models can assist software testing initiatives by prioritizing testing error-prone modules. In recent years, in addition to the traditional defect prediction model approach of predicting defects from class, modules, etc., Just-In-Time defect prediction research, which focuses on the change history of software products is getting prominent. For building these defect prediction models, it is important to understand which features are primary contributors to these classifiers. This study considered developing defect prediction models incorporating the traditional and the Just-In-Time approaches from the publicly available dataset of the Apache Camel project. A multi-layer deep learning algorithm was applied to these datasets in comparison with machine learning algorithms. The deep learning algorithm achieved accuracies of 80% and 86%, with the area under receiving operator curve (AUC) scores of 66% and 78% for traditional and Just-In-Time defect prediction, respectively. Finally, the feature importance of these models was identified using a model-specific integrated gradient method and a model-agnostic Shapley Additive Explanation (SHAP) technique.
翻译:软件缺陷预测模型可通过优先测试易错模块来辅助软件测试工作。近年来,除了传统的基于类、模块等静态特征的缺陷预测方法外,关注软件产品变更历史的即时缺陷预测研究日益受到重视。构建这些缺陷预测模型时,理解哪些特征是分类器的主要贡献因素至关重要。本研究基于Apache Camel项目的公开数据集,分别构建了传统方法与即时方法的缺陷预测模型。采用多层深度学习算法处理这些数据集,并与传统机器学习算法进行对比。深度学习算法在传统缺陷预测和即时缺陷预测中分别达到了80%和86%的准确率,其接收者操作特征曲线下面积(AUC)得分分别为66%和78%。最后,通过模型特定的积分梯度方法与模型无关的沙普利加性解释(SHAP)技术,识别了这些模型的特征重要性。