Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.
翻译:在强化学习领域中,基于梯度的策略学习方法已成功应用于自动驾驶车辆的控制策略学习。尽管这些方法的性能已达到实际应用标准,但其策略缺乏可解释性,限制了其在安全关键且受法律监管的自动驾驶领域的部署可行性。自动驾驶需要既保持高性能又具备可解释性与可验证性的控制策略。我们提出可解释连续控制树(ICCTs),这是一种基于树结构的模型,可通过现代梯度强化学习方法优化,生成高性能、可解释的策略。该方法的核心在于提出一种能在稀疏决策树类表示中实现直接优化的流程。我们在六个领域中将ICCTs与基准方法进行对比验证,结果显示ICCTs能够学习到可解释的策略表示,在自动驾驶场景中性能与基准方法持平或最高提升33%,同时策略参数数量较深度学习基准方法减少300-600倍。此外,我们通过包含14辆实体机器人的物理实验演示了ICCTs的可解释性与实用价值。