Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.
翻译:先前在进行即时(JIT)缺陷预测任务的研究者主要关注单一预训练模型的性能,未深入探究不同预训练模型作为骨干网络之间的关联。本研究构建了六个模型:RoBERTaJIT、CodeBERTJIT、BARTJIT、PLBARTJIT、GPT2JIT和CodeGPTJIT,每个模型采用不同的预训练模型作为骨干网络。我们系统性地探究了这些模型间的差异与联系,具体研究了使用提交代码和提交信息作为输入时各模型的性能表现,以及六个模型的训练效率与模型分布之间的关系。此外,我们通过消融实验考察了每个模型对输入的敏感度,并进一步探究了模型在零样本和少样本场景下的表现。研究发现,基于不同骨干网络的模型均有性能提升,且当骨干网络的预训练模型相似时,所需消耗的训练资源更为接近。我们还观察到提交代码在缺陷检测中发挥关键作用,不同预训练模型在少样本场景下通过平衡数据集展现出更优的缺陷检测能力。这些结果为利用预训练模型优化JIT缺陷预测任务提供了新见解,并揭示了构建此类模型时需要重点关注的因素。此外,在2000个训练样本条件下,CodeGPTJIT和GPT2JIT在两个数据集上分别取得了优于DeepJIT和CC2Vec的性能表现。这些发现强调了基于Transformer的预训练模型在JIT缺陷预测任务中的有效性,特别是在训练数据有限的场景下。