Coping with intensively interactive scenarios is one of the significant challenges in the development of autonomous driving. Reinforcement learning (RL) offers an ideal solution for such scenarios through its self-evolution mechanism via interaction with the environment. However, the lack of sufficient safety mechanisms in common RL leads to the fact that agent often find it difficult to interact well in highly dynamic environment and may collide in pursuit of short-term rewards. Much of the existing safe RL methods require environment modeling to generate reliable safety boundaries that constrain agent behavior. Nevertheless, acquiring such safety boundaries is not always feasible in dynamic environments. Inspired by the driver's behavior of acting when uncertainty is minimal, this study introduces the concept of action timing to replace explicit safety boundary modeling. We define "actor" as an agent to decide optimal action at each step. By imaging the actor take opportunity to act as a timing-dependent gradual process, the other agent called "timing taker" can evaluate the optimal action execution time, and relate the optimal timing to each action moment as a dynamic safety factor to constrain the actor's action. In the experiment involving a complex, unsignaled intersection interaction, this framework achieved superior safety performance compared to all benchmark models.
翻译:应对高度交互场景是自动驾驶发展中的重大挑战之一。强化学习(RL)通过与环境交互的自演进机制为此类场景提供了理想解决方案。然而,常规RL方法缺乏充分的安全机制,导致智能体在高度动态环境中难以实现良好交互,并可能为追求短期奖励而发生碰撞。现有安全RL方法大多需要通过环境建模生成可靠的安全边界以约束智能体行为,但在动态环境中获取此类安全边界往往不可行。受驾驶员在不确定性最小时采取行动的行为模式启发,本研究引入动作时序概念以替代显式安全边界建模。我们将“执行器”定义为在每个时间步决定最优动作的智能体。通过将执行器的动作时机视为时序依赖的渐进过程,另一个称为“时序决策器”的智能体可评估最优动作执行时间,并将最优时序与每个动作时刻关联为动态安全因子以约束执行器行为。在包含复杂无信号交叉口交互的实验中,该框架相比所有基准模型实现了更优的安全性能。