This work stems from three observations on prior Just-In-Time Software Defect Prediction (JIT-SDP) models. First, prior studies treat the JIT-SDP problem solely as a classification problem. Second, prior JIT-SDP studies do not consider that class balancing processing may change the underlying characteristics of software changeset data. Third, only a single source of concept drift, the class imbalance evolution is addressed in prior JIT-SDP incremental learning models. We propose an incremental learning framework called CPI-JIT for JIT-SDP. First, in addition to a classification modeling component, the framework includes a time-series forecast modeling component in order to learn temporal interdependent relationship in the changesets. Second, the framework features a purposefully designed over-sampling balancing technique based on SMOTE and Principal Curves called SMOTE-PC. SMOTE-PC preserves the underlying distribution of software changeset data. In this framework, we propose an incremental deep neural network model called DeepICP. Via an evaluation using \numprojs software projects, we show that: 1) SMOTE-PC improves the model's predictive performance; 2) to some software projects it can be beneficial for defect prediction to harness temporal interdependent relationship of software changesets; and 3) principal curves summarize the underlying distribution of changeset data and reveals a new source of concept drift that the DeepICP model is proposed to adapt to.
翻译:本研究源于对现有即时软件缺陷预测(JIT-SDP)模型的三点观察:首先,既有研究将JIT-SDP问题仅视为分类问题;其次,现有JIT-SDP研究未考虑类别平衡处理可能改变软件变更集数据的底层分布特性;最后,现有JIT-SDP增量学习模型仅处理单一概念漂移源——类别不平衡演化问题。我们提出名为CPI-JIT的增量学习框架用于JIT-SDP。首先,除分类建模组件外,该框架引入时间序列预测建模组件以学习变更集间的时序依赖关系;其次,框架采用基于SMOTE与主曲线设计的过采样平衡技术SMOTE-PC,该技术可保留软件变更集数据的底层分布特性。基于此框架,我们提出名为DeepICP的增量深度神经网络模型。通过基于\ numprojs个软件项目的评估实验表明:1)SMOTE-PC能提升模型预测性能;2)对部分软件项目而言,利用变更集时序依赖关系可增强缺陷预测效果;3)主曲线能够归纳变更集数据的底层分布特性,并揭示DeepICP模型需适应的新型概念漂移源。