Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

Autonomous Driving System (ADS) testing is crucial in ADS development, with the current primary focus being on safety. However, the evaluation of non-safety-critical performance, particularly the ADS's ability to make optimal decisions and produce optimal paths for autonomous vehicles (AVs), is equally vital to ensure the intelligence and reduce risks of AVs. Currently, there is little work dedicated to assessing ADSs' optimal decision-making performance due to the lack of corresponding oracles and the difficulty in generating scenarios with non-optimal decisions. In this paper, we focus on evaluating the decision-making quality of an ADS and propose the first method for detecting non-optimal decision scenarios (NoDSs), where the ADS does not compute optimal paths for AVs. Firstly, to deal with the oracle problem, we propose a novel metamorphic relation (MR) aimed at exposing violations of optimal decisions. The MR identifies the property that the ADS should retain optimal decisions when the optimal path remains unaffected by non-invasive changes. Subsequently, we develop a new framework, Decictor, designed to generate NoDSs efficiently. Decictor comprises three main components: Non-invasive Mutation, MR Check, and Feedback. The Non-invasive Mutation ensures that the original optimal path in the mutated scenarios is not affected, while the MR Check is responsible for determining whether non-optimal decisions are made. To enhance the effectiveness of identifying NoDSs, we design a feedback metric that combines both spatial and temporal aspects of the AV's movement. We evaluate Decictor on Baidu Apollo, an open-source and production-grade ADS. The experimental results validate the effectiveness of Decictor in detecting non-optimal decisions of ADSs. Our work provides valuable and original insights into evaluating the non-safety-critical performance of ADSs.

翻译：自动驾驶系统（ADS）测试是ADS开发中的关键环节，当前主要关注安全性。然而，非安全关键性能的评估，特别是ADS为自动驾驶车辆（AV）做出最优决策并生成最优路径的能力，对于确保AV的智能性和降低风险同样至关重要。目前，由于缺乏相应的测试预言以及难以生成存在非最优决策的场景，鲜有研究致力于评估ADS的最优决策性能。本文聚焦于评估ADS的决策质量，提出了检测非最优决策场景（NoDSs）的首个方法——即ADS未为AV计算最优路径的场景。首先，为解决测试预言问题，我们提出了一种新颖的蜕变关系（MR），旨在暴露最优决策的违反行为。该MR定义了当最优路径不受非侵入性变化影响时，ADS应保持最优决策的属性。随后，我们开发了一个新框架Decictor，用于高效生成NoDSs。Decictor包含三个主要组件：非侵入性变异、MR检查和反馈。其中非侵入性变异确保变异场景中的原始最优路径不受影响，而MR检查则负责判定是否做出了非最优决策。为提高识别NoDSs的有效性，我们设计了一种融合AV运动空间和时间维度的反馈度量。我们在开源生产级ADS——百度Apollo上评估了Decictor。实验结果验证了Decictor在检测ADS非最优决策方面的有效性。我们的工作为评估ADS非安全关键性能提供了原创性且具有价值的见解。