Flaky tests can pass or fail non-deterministically, without alterations to a software system. Such tests are frequently encountered by developers and hinder the credibility of test suites. State-of-the-art research incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy. Moreover, the majority of automated flaky test repair solutions are designed for specific types of flaky tests. This research work proposes a novel categorization framework, called FlaKat, which uses machine-learning classifiers for fast and accurate prediction of the category of a given flaky test that reflects its root cause. Sampling techniques are applied to address the imbalance between flaky test categories in the International Dataset of Flaky Test (IDoFT). A new evaluation metric, called Flakiness Detection Capacity (FDC), is proposed for measuring the accuracy of classifiers from the perspective of information theory and provides proof for its effectiveness. The final FDC results are also in agreement with F1 score regarding which classifier yields the best flakiness classification.
翻译:脆弱测试可以在不修改软件系统的情况下非确定性通过或失败。此类测试常被开发者遇到,并损害测试套件的可信度。前沿研究将机器学习解决方案融入脆弱测试检测,并取得了相当高的准确性。此外,大多数自动化脆弱测试修复方案针对特定类型的脆弱测试设计。本研究提出一种名为FlaKat的新型分类框架,该框架使用机器学习分类器快速准确地预测给定脆弱测试的类别,以反映其根本原因。采用采样技术解决国际脆弱测试数据集(IDoFT)中脆弱测试类别间的不平衡问题。提出一种名为脆弱检测能力(FDC)的新评估指标,从信息论角度衡量分类器的准确性,并为其有效性提供证明。最终的FDC结果也与F1分数一致,均能指示哪个分类器能实现最佳的脆弱性分类。