We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation and in real-world driving. The core idea is to use MCTS to find a promising set of safe candidate trajectories and a deep IRL scoring function to select the most human-like among them. We evaluate TreeIRL against both classical and state-of-the-art planners in large-scale simulations and on 500+ miles of real-world autonomous driving in the Las Vegas metropolitan area. Test scenarios include dense urban traffic, adaptive cruise control, cut-ins, and traffic lights. TreeIRL achieves the best overall performance, striking a balance between safety, progress, comfort, and human-likeness. To our knowledge, our work is the first demonstration of MCTS-based planning on public roads and underscores the importance of evaluating planners across a diverse set of metrics and in real-world environments. TreeIRL is highly extensible and could be further improved with reinforcement learning and imitation learning, providing a framework for exploring different combinations of classical and learning-based approaches to solve the planning bottleneck in autonomous driving.
翻译:本文提出TreeIRL,一种结合蒙特卡洛树搜索与逆强化学习的新型自动驾驶规划器,在仿真和实际驾驶中均实现了最先进的性能。其核心思想是利用MCTS搜索一组有前景的安全候选轨迹,并通过深度IRL评分函数从中选择最类人的轨迹。我们在大规模仿真以及拉斯维加斯大都会区500多英里的实际自动驾驶中,将TreeIRL与经典及最先进的规划器进行了对比评估。测试场景包括密集城市交通、自适应巡航控制、切入换道及交通信号灯场景。TreeIRL取得了最佳的综合性能,在安全性、行进效率、舒适度及类人性之间实现了良好平衡。据我们所知,本研究首次展示了基于MCTS的规划器在公开道路上的实际应用,并强调了在多样化评价指标和真实环境中评估规划器的重要性。TreeIRL具备高度可扩展性,可通过强化学习与模仿学习进一步改进,为探索经典方法与基于学习的方法的不同组合以解决自动驾驶规划瓶颈提供了一个框架。