The rise of self-driving cars (SDCs) presents important safety challenges to address in dynamic environments. While field testing is essential, current methods lack diversity in assessing critical SDC scenarios. Prior research introduced simulation-based testing for SDCs, with Frenetic, a test generation approach based on Frenet space encoding, achieving a relatively high percentage of valid tests (approximately 50%) characterized by naturally smooth curves. The "minimal out-of-bound distance" is often taken as a fitness function, which we argue to be a sub-optimal metric. Instead, we show that the likelihood of leading to an out-of-bound condition can be learned by the deep-learning vanilla transformer model. We combine this "inherently learned metric" with a genetic algorithm, which has been shown to produce a high diversity of tests. To validate our approach, we conducted a large-scale empirical evaluation on a dataset comprising over 1,174 simulated test cases created to challenge the SDCs behavior. Our investigation revealed that our approach demonstrates a substantial reduction in generating non-valid test cases, increased diversity, and high accuracy in identifying safety violations during SDC test execution.
翻译:自驾驶汽车(SDC)的兴起带来了在动态环境中需要应对的重要安全挑战。尽管实地测试至关重要,但现有方法在评估关键SDC场景时缺乏多样性。先前研究引入了基于仿真的SDC测试,其中Frenetic作为一种基于Frenet空间编码的测试生成方法,能够生成较高比例(约50%)且具有自然平滑曲线特征的有效测试。通常采用“最小越界距离”作为适应度函数,但我们认为这是一个次优指标。相反,我们证明越界条件发生的概率可以通过深度学习中的vanilla Transformer模型进行学习。我们将这种“内在学习指标”与遗传算法相结合,后者已被证明能生成高度多样化的测试。为验证我们的方法,我们在一个包含1174个旨在挑战SDC行为的模拟测试案例数据集上进行了大规模实证评估。研究结果表明,我们的方法在生成非有效测试案例方面显著减少,同时提高了测试多样性,并在SDC测试执行过程中准确识别安全违规行为。