MARL-OT: Multi-Agent Reinforcement Learning Guided Online Fuzzing to Detect Safety Violation in Autonomous Driving Systems

Autonomous Driving Systems (ADSs) are safety-critical, as real-world safety violations can result in significant losses. Rigorous testing is essential before deployment, with simulation testing playing a key role. However, ADSs are typically complex, consisting of multiple modules such as perception and planning, or well-trained end-to-end autonomous driving systems. Offline methods, such as the Genetic Algorithm (GA), can only generate predefined trajectories for dynamics, which struggle to cause safety violations for ADSs rapidly and efficiently in different scenarios due to their evolutionary nature. Online methods, such as single-agent reinforcement learning (RL), can quickly adjust the dynamics' trajectory online to adapt to different scenarios, but they struggle to capture complex corner cases of ADS arising from the intricate interplay among multiple vehicles. Multi-agent reinforcement learning (MARL) has a strong ability in cooperative tasks. On the other hand, it faces its own challenges, particularly with convergence. This paper introduces MARL-OT, a scalable framework that leverages MARL to detect safety violations of ADS resulting from surrounding vehicles' cooperation. MARL-OT employs MARL for high-level guidance, triggering various dangerous scenarios for the rule-based online fuzzer to explore potential safety violations of ADS, thereby generating dynamic, realistic safety violation scenarios. Our approach improves the detected safety violation rate by up to 136.2% compared to the state-of-the-art (SOTA) testing technique.

翻译：自动驾驶系统（ADS）是安全关键系统，因为现实世界中的安全违规行为可能导致重大损失。在部署前进行严格测试至关重要，其中仿真测试发挥着关键作用。然而，自动驾驶系统通常非常复杂，由感知、规划等多个模块组成，或是训练有素的端到端自动驾驶系统。离线方法，例如遗传算法（GA），只能为动力学模型生成预定义的轨迹，由于其进化特性，难以在不同场景下快速有效地引发自动驾驶系统的安全违规。在线方法，例如单智能体强化学习（RL），可以在线快速调整动力学模型的轨迹以适应不同场景，但它们难以捕捉由多车辆间复杂交互引发的自动驾驶系统的复杂边界情况。多智能体强化学习（MARL）在协作任务方面具有很强的能力。另一方面，它也面临自身的挑战，特别是收敛性问题。本文介绍了MARL-OT，一个可扩展的框架，它利用MARL来检测由周围车辆协作导致的自动驾驶系统安全违规。MARL-OT采用MARL进行高层引导，触发各种危险场景，供基于规则的在线模糊测试器探索自动驾驶系统的潜在安全违规，从而生成动态、逼真的安全违规场景。与最先进的（SOTA）测试技术相比，我们的方法将检测到的安全违规率提高了高达136.2%。