ARFT-Transformer: Modeling Metric Dependencies for Cross-Project Aging-Related Bug Prediction

Software systems that run for long periods often suffer from software aging, which is typically caused by Aging-Related Bugs (ARBs). To mitigate the risk of ARBs early in the development phase, ARB prediction has been introduced into software aging research. However, due to the difficulty of collecting ARBs, within-project ARB prediction faces the challenge of data scarcity, leading to the proposal of cross-project ARB prediction. This task faces two major challenges: 1) domain adaptation issue caused by distribution difference between source and target projects; and 2) severe class imbalance between ARB-prone and ARB-free samples. Although various methods have been proposed for cross-project ARB prediction, existing approaches treat the input metrics independently and often neglect the rich inter-metric dependencies, which can lead to overlapping information and misjudgment of metric importance, potentially affecting the model's performance. Moreover, they typically use cross-entropy as the loss function during training, which cannot distinguish the difficulty of sample classification. To overcome these limitations, we propose ARFT-Transformer, a transformer-based cross-project ARB prediction framework that introduces a metric-level multi-head attention mechanism to capture metric interactions and incorporates Focal Loss function to effectively handle class imbalance. Experiments conducted on three large-scale open-source projects demonstrate that ARFT-Transformer on average outperforms state-of-the-art cross-project ARB prediction methods in both single-source and multi-source cases, achieving up to a 29.54% and 19.92% improvement in Balance metric.

翻译：长期运行的软件系统常受软件老化影响，其通常由老化相关缺陷（ARBs）引发。为在开发阶段早期降低ARB风险，ARB预测已被引入软件老化研究领域。然而，由于ARB收集困难，项目内ARB预测面临数据稀缺的挑战，由此催生了跨项目ARB预测方法。该任务面临两大挑战：1) 源项目与目标项目间分布差异导致的领域适应问题；2) ARB易发样本与无ARB样本间严重的类别不平衡。尽管已有多种跨项目ARB预测方法被提出，现有方法通常将输入度量指标视为独立变量，往往忽略了丰富的度量间依赖关系，这可能导致信息重叠与度量重要性误判，进而影响模型性能。此外，这些方法在训练时普遍采用交叉熵作为损失函数，无法区分样本分类难度。为突破这些局限，我们提出ARFT-Transformer——一种基于Transformer的跨项目ARB预测框架，该框架引入度量级多头注意力机制以捕捉度量交互关系，并融合Focal Loss函数有效处理类别不平衡问题。在三个大型开源项目上的实验表明，ARFT-Transformer在单源与多源场景下平均性能均优于当前最先进的跨项目ARB预测方法，其Balance指标最高分别提升29.54%与19.92%。