Autonomous agents that drive on roads shared with human drivers must reason about the nuanced interactions among traffic participants. This poses a highly challenging decision making problem since human behavior is influenced by a multitude of factors (e.g., human intentions and emotions) that are hard to model. This paper presents a decision making approach for autonomous driving, focusing on the complex task of merging into moving traffic where uncertainty emanates from the behavior of other drivers and imperfect sensor measurements. We frame the problem as a partially observable Markov decision process (POMDP) and solve it online with Monte Carlo tree search. The solution to the POMDP is a policy that performs high-level driving maneuvers, such as giving way to an approaching car, keeping a safe distance from the vehicle in front or merging into traffic. Our method leverages a model learned from data to predict the future states of traffic while explicitly accounting for interactions among the surrounding agents. From these predictions, the autonomous vehicle can anticipate the future consequences of its actions on the environment and optimize its trajectory accordingly. We thoroughly test our approach in simulation, showing that the autonomous vehicle can adapt its behavior to different situations. We also compare against other methods, demonstrating an improvement with respect to the considered performance metrics.
翻译:在与人驾驶者共享的道路上行驶的自主智能体必须推理交通参与者之间的微妙交互。这构成了极具挑战性的决策问题,因为人类行为受多种难以建模的因素(如人类意图和情绪)影响。本文提出一种面向自动驾驶的决策方法,聚焦于合流至动态车流这一复杂任务,其中不确定性源自其他驾驶者行为及不完善的传感器测量。我们将该问题建模为部分可观测马尔可夫决策过程(POMDP),并通过蒙特卡洛树搜索在线求解。POMDP的求解结果是执行高级驾驶操控的策略,例如为临近车辆让行、与前车保持安全距离或合流至车流。我们的方法利用从数据中学习的模型来预测交通未来状态,同时显式考虑周围智能体间的交互。基于这些预测,自动驾驶车辆可预见自身行动对环境的未来影响,并相应优化其轨迹。我们通过仿真全面测试了该方法,表明自动驾驶车辆能适应不同场景调整行为,并与其他方法对比,展示了在各项性能指标上的提升。