To alleviate the cost of regression testing in continuous integration (CI), a large number of machine learning-based (ML-based) test case prioritization techniques have been proposed. However, it is yet unknown how they perform under the same experimental setup, because they are evaluated on different datasets with different metrics. To bridge this gap, we conduct the first comprehensive study on these ML-based techniques in this paper. We investigate the performance of 11 representative ML-based prioritization techniques for CI on 11 open-source subjects and obtain a series of findings. For example, the performance of the techniques changes across CI cycles, mainly resulting from the changing amount of training data, instead of code evolution and test removal/addition. Based on the findings, we give some actionable suggestions on enhancing the effectiveness of ML-based techniques, e.g., pretraining a prioritization technique with cross-subject data to get it thoroughly trained and then finetuning it with within-subject data dramatically improves its performance. In particular, the pretrained MART achieves state-of-the-art performance, producing the optimal sequence on 80% subjects, while the existing best technique, the original MART, only produces the optimal sequence on 50% subjects.
翻译:为减轻持续集成中回归测试的成本,大量基于机器学习的测试用例优先级排序技术已被提出。然而,这些技术在不同数据集和评估指标下的表现尚不明确,因此无法在相同实验条件下进行直接比较。为填补这一空白,本文首次对这些基于机器学习的技术展开了全面研究。我们针对11个开源项目,考察了11种代表性基于机器学习的持续集成优先级排序技术的性能,并获得了一系列发现。例如,这些技术的性能会随持续集成周期的变化而波动,主要源于训练数据量的动态变化,而非代码演进或测试用例的增删。基于这些发现,我们提出了若干切实可行的建议以提升基于机器学习技术的有效性,例如利用跨项目数据预训练优先级排序技术使其充分学习,再通过项目内数据微调,可显著提升性能。特别地,经过预训练的MART模型达到了当前最优性能,在80%的项目上生成了最优排序序列,而现有最佳技术——原始MART仅在50%的项目上生成最优序列。