In the domain of autonomous driving, the offline Reinforcement Learning~(RL) approaches exhibit notable efficacy in addressing sequential decision-making problems from offline datasets. However, maintaining safety in diverse safety-critical scenarios remains a significant challenge due to long-tailed and unforeseen scenarios absent from offline datasets. In this paper, we introduce the saFety-aware strUctured Scenario representatION (FUSION), a pioneering representation learning method in offline RL to facilitate the learning of a generalizable end-to-end driving policy by leveraging structured scenario information. FUSION capitalizes on the causal relationships between the decomposed reward, cost, state, and action space, constructing a framework for structured sequential reasoning in dynamic traffic environments. We conduct extensive evaluations in two typical real-world settings of the distribution shift in autonomous vehicles, demonstrating the good balance between safety cost and utility reward compared to the current state-of-the-art safe RL and IL baselines. Empirical evidence in various driving scenarios attests that FUSION significantly enhances the safety and generalizability of autonomous driving agents, even in the face of challenging and unseen environments. Furthermore, our ablation studies reveal noticeable improvements in the integration of causal representation into the offline safe RL algorithm. Our code implementation is available at: https://sites.google.com/view/safe-fusion/.
翻译:在自动驾驶领域,离线强化学习方法在处理基于离线数据集的序列决策问题时展现出显著效果。然而,由于离线数据集中存在的长尾及不可预见场景,在各类安全关键场景中保持安全性仍是一大挑战。本文提出面向安全的结构化场景表示方法(FUSION),这是一种用于离线强化学习的开创性表示学习方法,通过利用结构化场景信息促进可泛化的端到端驾驶策略学习。FUSION利用分解后的奖励、代价、状态与动作空间之间的因果关系,构建了动态交通环境中结构化序列推理框架。我们在自动驾驶车辆分布偏移的两种典型真实场景中开展广泛评估,结果表明相较于当前最先进的安全强化学习与模仿学习基线方法,该方法在安全代价与效用奖励之间实现了良好平衡。多种驾驶场景下的实证证据表明,即便面对具有挑战性的未知环境,FUSION仍能显著提升自动驾驶智能体的安全性与泛化能力。此外,消融研究揭示了将因果表示整合进离线安全强化学习算法带来的显著改进。我们的代码实现参见:https://sites.google.com/view/safe-fusion/。