AI-powered systems have gained widespread popularity in various domains, including Autonomous Vehicles (AVs). However, ensuring their reliability and safety is challenging due to their complex nature. Conventional test adequacy metrics, designed to evaluate the effectiveness of traditional software testing, are often insufficient or impractical for these systems. White-box metrics, which are specifically designed for these systems, leverage neuron coverage information. These coverage metrics necessitate access to the underlying AI model and training data, which may not always be available. Furthermore, the existing adequacy metrics exhibit weak correlations with the ability to detect faults in the generated test suite, creating a gap that we aim to bridge in this study. In this paper, we introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics, which can be used to gauge the effectiveness of a test suite. The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing. Additionally, we introduce a framework that permits testers to visualise the diversity and coverage of the test suite in a two-dimensional space, facilitating the identification of areas that require improvement. We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs. A strong correlation, coupled with the short computation time, indicates their effectiveness and efficiency in estimating the adequacy of testing AVs.
翻译:基于AI的系统在各个领域(包括自动驾驶汽车)中已获得广泛普及。然而,由于其复杂的特性,确保其可靠性和安全性仍具有挑战性。传统为评估传统软件测试效能设计的测试充分性度量,往往不足以或不适用于这些系统。专为这类系统设计的白盒度量利用了神经元覆盖信息。这些覆盖度量需要访问底层AI模型和训练数据,但这并非总能实现。此外,现有充分性度量与生成测试套件故障检测能力之间的相关性较弱,本研究旨在弥合这一差距。本文提出一组名为"测试套件实例空间充分性"的黑盒测试充分性度量,可用于评估测试套件的效能。TISA度量提供了一种评估测试套件多样性与覆盖度以及测试期间检测到的缺陷范围的方法。同时,我们引入一个框架,允许测试人员在二维空间中可视化测试套件的多样性与覆盖度,便于识别需改进的领域。通过检验TISA度量与自动驾驶系统级仿真测试中检测到的缺陷数量的相关性,我们评估了其有效性。强相关性结合较短的计算时间,表明其在评估自动驾驶测试充分性方面的有效性和高效性。