DeepQTest: Testing Autonomous Driving Systems with Reinforcement Learning and Real-world Weather Data

Autonomous driving systems (ADSs) are capable of sensing the environment and making driving decisions autonomously. These systems are safety-critical, and testing them is one of the important approaches to ensure their safety. However, due to the inherent complexity of ADSs and the high dimensionality of their operating environment, the number of possible test scenarios for ADSs is infinite. Besides, the operating environment of ADSs is dynamic, continuously evolving, and full of uncertainties, which requires a testing approach adaptive to the environment. In addition, existing ADS testing techniques have limited effectiveness in ensuring the realism of test scenarios, especially the realism of weather conditions and their changes over time. Recently, reinforcement learning (RL) has demonstrated great potential in addressing challenging problems, especially those requiring constant adaptations to dynamic environments. To this end, we present DeepQTest, a novel ADS testing approach that uses RL to learn environment configurations with a high chance of revealing abnormal ADS behaviors. Specifically, DeepQTest employs Deep Q-Learning and adopts three safety and comfort measures to construct the reward functions. To ensure the realism of generated scenarios, DeepQTest defines a set of realistic constraints and introduces real-world weather conditions into the simulated environment. We employed three comparison baselines, i.e., random, greedy, and a state-of-the-art RL-based approach DeepCOllision, for evaluating DeepQTest on an industrial-scale ADS. Evaluation results show that DeepQTest demonstrated significantly better effectiveness in terms of generating scenarios leading to collisions and ensuring scenario realism compared with the baselines. In addition, among the three reward functions implemented in DeepQTest, Time-To-Collision is recommended as the best design according to our study.

翻译：自动驾驶系统（ADS）能够感知环境并自主做出驾驶决策。这类系统属于安全关键系统，对其进行测试是保障安全的重要手段之一。然而，由于ADS固有的复杂性和其运行环境的高维度特性，可能的测试场景数量是无限的。此外，ADS的运行环境动态变化、持续演进且充满不确定性，这就要求测试方法能够适应环境。同时，现有ADS测试技术在确保测试场景真实性方面效果有限，尤其难以模拟气象条件及其随时间变化的真实性。近年来，强化学习（RL）在解决复杂问题方面展现出巨大潜力，特别是那些需要持续适应动态环境的问题。为此，我们提出DeepQTest——一种新颖的ADS测试方法，它利用RL学习能够高概率暴露ADS异常行为的环境配置。具体而言，DeepQTest采用深度Q学习，并引入三项安全与舒适度量指标来构建奖励函数。为确保生成场景的真实性，DeepQTest定义了一组真实约束条件，并将真实气象数据引入仿真环境。我们设置了三种对比基线方法（随机方法、贪婪方法以及基于RL的前沿方法DeepCOllision）来评估DeepQTest在工业级ADS上的性能。评估结果表明，相比基线方法，DeepQTest在生成导致碰撞的场景以及确保场景真实性方面展现出显著更优的效果。此外，在DeepQTest实现的三种奖励函数中，根据研究分析，碰撞时间（Time-To-Collision）被推荐作为最优设计方案。