Test case prioritisation (TCP) is a critical task in regression testing to ensure quality as software evolves. Machine learning has become a common way to achieve it. In particular, learning-to-rank (LTR) algorithms provide an effective method of ordering and prioritising test cases. However, their use poses a challenge in terms of explainability, both globally at the model level and locally for particular results. Here, we present and discuss scenarios that require different explanations and how the particularities of TCP (multiple builds over time, test case and test suite variations, etc.) could influence them. We include a preliminary experiment to analyse the similarity of explanations, showing that they do not only vary depending on test case-specific predictions, but also on the relative ranks.
翻译:测试用例优先级排序(TCP)是回归测试中确保软件演化质量的关键任务。机器学习已成为实现该任务的常用方法。其中,学习排序(LTR)算法提供了一种有效的测试用例排序与优先级划分方法。然而,其应用在可解释性方面提出了挑战,既包括模型层面的全局解释,也涉及特定结果的局部解释。本文提出并讨论了需要不同解释的场景,以及TCP的特殊性(随时间推移的多次构建、测试用例与测试套件的变体等)如何影响这些解释。我们通过初步实验分析了解释的相似性,结果表明解释不仅随测试用例特定预测而变化,也受相对排序的影响。