Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we propose a distributed multi-agent reinforcement learning (MARL) algorithm with trajectory and intent prediction in dense and heterogeneous traffic scenarios. Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers' intents solely from their local observations. We model two distinct incentives for agents' strategies: Behavioral incentives for agents' long-term planning based on their driving behavior or personality; Instant incentives for agents' short-term planning for collision avoidance based on the current traffic state. We design a two-stream inference module that allows agents to infer their opponents' incentives and incorporate their inferred information into decision-making. We perform experiments on two simulation environments, Non-Cooperative Navigation and Heterogeneous Highway. In Heterogeneous Highway, results show that, compared with centralized MARL baselines such as QMIX and MAPPO, our method yields a 4.0% and 35.7% higher episodic reward in mild and chaotic traffic, with 48.1% higher success rate and 80.6% longer survival time in chaotic traffic. We also compare with a decentralized baseline IPPO and demonstrate a higher episodic reward of 9.2% and 10.3% in mild traffic and chaotic traffic, 25.3% higher success rate, and 13.7% longer survival time.
翻译:在密集且异构的交通场景中,由于自动驾驶车辆(AVs)无法推断周边驾驶者的行为或意图,实现安全高效导航具有挑战性。本文提出一种融合轨迹预测与意图感知的分布式多智能体强化学习(MARL)算法,适用于密集异构交通场景。我们的意图感知规划方法iPLAN使智能体仅通过局部观测推断周边驾驶者的意图。我们为智能体策略建模两种差异化动机:基于驾驶行为或个性的长期规划行为动机;以及基于当前交通状态用于碰撞规避的短期规划即时动机。设计双流推理模块使智能体能推断对手动机,并将推断信息融入决策过程。我们在非合作导航与异构高速公路两个仿真环境中进行实验。在异构高速公路场景中,与QMIX和MAPPO等集中式MARL基线相比,本方法在平缓与混乱交通下分别获得4.0%和35.7%更高的累积奖励,在混乱交通中实现48.1%更高成功率及80.6%更长存活时间。与去中心化基线IPPO对比,本方法在平缓与混乱交通中分别获得9.2%和10.3%的累积奖励提升,25.3%更高成功率及13.7%更长存活时间。