Self-driving vehicles (SDVs) are becoming reality but still suffer from "long-tail" challenges during natural driving: the SDVs will continually encounter rare, safety-critical cases that may not be included in the dataset they were trained. Some safety-assurance planners solve this problem by being conservative in all possible cases, which may significantly affect driving mobility. To this end, this work proposes a method to automatically adjust the conservative level according to each case's "long-tail" rate, named dynamically conservative planner (DCP). We first define the "long-tail" rate as an SDV's confidence to pass a driving case. The rate indicates the probability of safe-critical events and is estimated using the statistics bootstrapped method with historical data. Then, a reinforcement learning-based planner is designed to contain candidate policies with different conservative levels. The final policy is optimized based on the estimated "long-tail" rate. In this way, the DCP is designed to automatically adjust to be more conservative in low-confidence "long-tail" cases while keeping efficient otherwise. The DCP is evaluated in the CARLA simulator using driving cases with "long-tail" distributed training data. The results show that the DCP can accurately estimate the "long-tail" rate to identify potential risks. Based on the rate, the DCP automatically avoids potential collisions in "long-tail" cases using conservative decisions while not affecting the average velocity in other typical cases. Thus, the DCP is safer and more efficient than the baselines with fixed conservative levels, e.g., an always conservative planner. This work provides a technique to guarantee SDV's performance in unexpected driving cases without resorting to a global conservative setting, which contributes to solving the "long-tail" problem practically.
翻译:自动驾驶车辆虽已逐渐成为现实,但在自然驾驶中仍面临"长尾"挑战:它们会持续遭遇训练数据集中未包含的罕见安全关键场景。部分安全保证型规划器通过在所有场景中采取保守策略来解决该问题,但这会显著影响驾驶机动性。为此,本文提出一种可根据各场景"长尾率"自动调整保守程度的方法,命名为动态保守规划器(DCP)。我们首先将"长尾率"定义为自动驾驶车辆通过某驾驶场景的置信度,该指标表征安全关键事件发生概率,并采用基于历史数据的统计自助法进行估计。随后设计基于强化学习的规划器,整合具有不同保守程度的候选策略。最终策略基于估算的"长尾率"进行优化。通过这种方式,DCP被设计为能在低置信度的"长尾"场景中自动提高保守性,而在其他场景中保持高效运行。我们在搭载"长尾"分布训练数据的CARLA仿真器中对该规划器进行评估,结果表明DCP能准确估计"长尾率"以识别潜在风险。基于该比率,DCP在"长尾"场景中自动通过保守决策规避潜在碰撞,同时不影响其他常规场景的平均行驶速度。因此,相较于固定保守程度的基线方法(如全局保守规划器),DCP具有更高的安全性与运行效率。本研究提供了一种无需设置全局保守策略即可保障自动驾驶车辆在非预期驾驶场景中性能的技术方案,为实际解决"长尾"问题做出贡献。