Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization

Most studies focused on information retrieval-based techniques for fault localization, which built representations for bug reports and source code files and matched their semantic vectors through similarity measurement. However, such approaches often ignore some useful information that might help improve localization performance, such as 1) the interaction relationship between bug reports and source code files; 2) the similarity relationship between bug reports; and 3) the co-citation relationship between source code files. In this paper, we propose a novel approach named Multi-View Adaptive Contrastive Learning for Information Retrieval Fault Localization (MACL-IRFL) to learn the above-mentioned relationships for software fault localization. Specifically, we first generate data augmentations from report-code interaction view, report-report similarity view and code-code co-citation view separately, and adopt graph neural network to aggregate the information of bug reports or source code files from the three views in the embedding process. Moreover, we perform contrastive learning across these views. Our design of contrastive learning task will force the bug report representations to encode information shared by report-report and report-code views,and the source code file representations shared by code-code and report-code views, thereby alleviating the noise from auxiliary information. Finally, to evaluate the performance of our approach, we conduct extensive experiments on five open-source Java projects. The results show that our model can improve over the best baseline up to 28.93%, 25.57% and 20.35% on Accuracy@1, MAP and MRR, respectively.

翻译：大多数研究集中于基于信息检索的故障定位技术，该方法为缺陷报告和源代码文件构建表示，并通过相似性度量匹配其语义向量。然而，此类方法往往忽略了一些可能有助于提升定位性能的有用信息，例如：1) 缺陷报告与源代码文件之间的交互关系；2) 缺陷报告之间的相似性关系；3) 源代码文件之间的共引用关系。本文提出一种名为“基于信息检索故障定位的多视图自适应对比学习”（MACL-IRFL）的新方法，以学习上述关系用于软件故障定位。具体而言，我们首先分别从报告-代码交互视图、报告-报告相似性视图和代码-代码共引用视图生成数据增强，并在嵌入过程中采用图神经网络聚合来自这三个视图的缺陷报告或源代码文件信息。此外，我们在这些视图之间进行对比学习。我们设计的对比学习任务将迫使缺陷报告表示编码报告-报告视图和报告-代码视图共享的信息，并使源代码文件表示编码代码-代码视图和报告-代码视图共享的信息，从而减轻辅助信息带来的噪声。最后，为评估我们方法的性能，我们在五个开源Java项目上进行了大量实验。结果表明，我们的模型在Accuracy@1、MAP和MRR指标上分别比最佳基线提升了最高28.93%、25.57%和20.35%。