Multi-vehicle pursuit (MVP) such as autonomous police vehicles pursuing suspects is important but very challenging due to its mission and safety critical nature. While multi-agent reinforcement learning (MARL) algorithms have been proposed for MVP problem in structured grid-pattern roads, the existing algorithms use randomly training samples in centralized learning, which leads to homogeneous agents showing low collaboration performance. For the more challenging problem of pursuing multiple evading vehicles, these algorithms typically select a fixed target evading vehicle for pursuing vehicles without considering dynamic traffic situation, which significantly reduces pursuing success rate. To address the above problems, this paper proposes a Progression Cognition Reinforcement Learning with Prioritized Experience for MVP (PEPCRL-MVP) in urban multi-intersection dynamic traffic scenes. PEPCRL-MVP uses a prioritization network to assess the transitions in the global experience replay buffer according to the parameters of each MARL agent. With the personalized and prioritized experience set selected via the prioritization network, diversity is introduced to the learning process of MARL, which can improve collaboration and task related performance. Furthermore, PEPCRL-MVP employs an attention module to extract critical features from complex urban traffic environments. These features are used to develop progression cognition method to adaptively group pursuing vehicles. Each group efficiently target one evading vehicle in dynamic driving environments. Extensive experiments conducted with a simulator over unstructured roads of an urban area show that PEPCRL-MVP is superior to other state-of-the-art methods. Specifically, PEPCRL-MVP improves pursuing efficiency by 3.95% over TD3-DMAP and its success rate is 34.78% higher than that of MADDPG. Codes are open sourced.
翻译:多车追捕(MVP)问题(如自动驾驶警车追捕嫌疑人)因其任务和安全关键性而重要但极具挑战性。尽管已有多智能体强化学习(MARL)算法用于结构化网格道路上的MVP问题,但现有算法在集中式学习中采用随机训练样本,导致同质化智能体协作性能低下。针对追捕多辆逃逸车辆这一更具挑战性的问题,现有算法通常为追捕车辆选择固定目标逃逸车辆,而未考虑动态交通状况,这显著降低了追捕成功率。为解决上述问题,本文提出了一种适用于城市多交叉口动态交通场景的基于优先经验的多车追捕进度认知强化学习算法(PEPCRL-MVP)。PEPCRL-MVP利用优先级网络根据每个MARL智能体的参数评估全局经验回放缓冲区的转换。通过优先级网络选取个性化且优先化的经验集,为MARL学习过程引入多样性,从而提高协作与任务相关性能。此外,PEPCRL-MVP采用注意力模块从复杂城市交通环境中提取关键特征。这些特征用于开发进度认知方法,以自适应地分组追捕车辆。每个组在动态驾驶环境中高效追踪一辆逃逸车辆。在非结构化城市道路模拟器上进行的广泛实验表明,PEPCRL-MVP优于其他现有最先进方法。具体而言,PEPCRL-MVP的追捕效率比TD3-DMAP提高3.95%,其成功率比MADDPG高34.78%。代码已开源。