Feature Importance in the Context of Traditional and Just-In-Time Software Defect Prediction Models

Software defect prediction models can assist software testing initiatives by prioritizing testing error-prone modules. In recent years, in addition to the traditional defect prediction model approach of predicting defects from class, modules, etc., Just-In-Time defect prediction research, which focuses on the change history of software products is getting prominent. For building these defect prediction models, it is important to understand which features are primary contributors to these classifiers. This study considered developing defect prediction models incorporating the traditional and the Just-In-Time approaches from the publicly available dataset of the Apache Camel project. A multi-layer deep learning algorithm was applied to these datasets in comparison with machine learning algorithms. The deep learning algorithm achieved accuracies of 80% and 86%, with the area under receiving operator curve (AUC) scores of 66% and 78% for traditional and Just-In-Time defect prediction, respectively. Finally, the feature importance of these models was identified using a model-specific integrated gradient method and a model-agnostic Shapley Additive Explanation (SHAP) technique.

翻译：软件缺陷预测模型可通过优先测试易错模块来辅助软件测试工作。近年来，除了传统的基于类、模块等静态特征的缺陷预测方法外，关注软件产品变更历史的即时缺陷预测研究日益受到重视。构建这些缺陷预测模型时，理解哪些特征是分类器的主要贡献因素至关重要。本研究基于Apache Camel项目的公开数据集，分别构建了传统方法与即时方法的缺陷预测模型。采用多层深度学习算法处理这些数据集，并与传统机器学习算法进行对比。深度学习算法在传统缺陷预测和即时缺陷预测中分别达到了80%和86%的准确率，其接收者操作特征曲线下面积（AUC）得分分别为66%和78%。最后，通过模型特定的积分梯度方法与模型无关的沙普利加性解释（SHAP）技术，识别了这些模型的特征重要性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/