In this work, we present SafePlanner, a systematic testing framework for identifying safety-critical flaws in the Plan model of Automated Driving Systems (ADS). SafePlanner targets two core challenges: generating structurally meaningful test scenarios and detecting hazardous planning behaviors. To maximize coverage, SafePlanner performs a structural analysis of the Plan model implementation - specifically, its scene-transition logic and hierarchical control flow - and uses this insight to extract feasible scene transitions from code. It then composes test scenarios by combining these transitions with non-player vehicle (NPC) behaviors. Guided fuzzing is applied to explore the behavioral space of the Plan model under these scenarios. We evaluate SafePlanner on Baidu Apollo, a production-grade level 4 ADS. It generates 20635 test cases and detects 520 hazardous behaviors, grouped into 15 root causes through manual analysis. For four of these, we applied patches based on our analysis; the issues disappeared, and no apparent side effects were observed. SafePlanner achieves 83.63 percent function and 63.22 percent decision coverage on the Plan model, outperforming baselines in both bug discovery and efficiency.
翻译:本文提出SafePlanner,一种用于识别自动驾驶系统规划模型中安全关键缺陷的系统化测试框架。SafePlanner针对两大核心挑战:生成具有结构意义的测试场景与检测危险规划行为。为最大化覆盖度,SafePlanner对规划模型实现——特别是其场景转换逻辑与分层控制流——进行结构分析,并利用该分析从代码中提取可行的场景转换。随后通过将这些转换与非玩家车辆行为相结合来构建测试场景。在此类场景下,采用引导式模糊测试以探索规划模型的行为空间。我们在百度Apollo(生产级L4自动驾驶系统)上评估SafePlanner。该方法生成20635个测试用例,检测到520个危险行为,经人工分析归纳为15个根本原因。针对其中四个原因,我们基于分析结果应用补丁;相关问题消失且未观察到明显副作用。SafePlanner在规划模型上实现了83.63%的函数覆盖率和63.22%的决策覆盖率,在缺陷发现效率与测试效能方面均优于基线方法。