Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment

2024 年 1 月 3 日

翻译：优化无人机-无人车协同行动：面向障碍环境下路径规划的混合聚类与多智能体强化学习方法

Shamyo Brotee,Farhan Kabir,Md. Abdur Razzaque,Palash Roy,Md. Mamun-Or-Rashid,Md. Rafiul Hassan,Mohammad Mehedi Hassan

One of the most critical applications undertaken by coalitions of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is reaching predefined targets by following the most time-efficient routes while avoiding collisions. Unfortunately, UAVs are hampered by limited battery life, and UGVs face challenges in reachability due to obstacles and elevation variations. Existing literature primarily focuses on one-to-one coalitions, which constrains the efficiency of reaching targets. In this work, we introduce a novel approach for a UAV-UGV coalition with a variable number of vehicles, employing a modified mean-shift clustering algorithm to segment targets into multiple zones. Each vehicle utilizes Multi-agent Deep Deterministic Policy Gradient (MADDPG) and Multi-agent Proximal Policy Optimization (MAPPO), two advanced reinforcement learning algorithms, to form an effective coalition for navigating obstructed environments without collisions. This approach of assigning targets to various circular zones, based on density and range, significantly reduces the time required to reach these targets. Moreover, introducing variability in the number of UAVs and UGVs in a coalition enhances task efficiency by enabling simultaneous multi-target engagement. The results of our experimental evaluation demonstrate that our proposed method substantially surpasses current state-of-the-art techniques, nearly doubling efficiency in terms of target navigation time and task completion rate.

翻译：无人机（UAV）与无人车（UGV）协同执行的最关键任务之一是遵循最高效的路径抵达预设目标点，同时避免碰撞。然而，无人机受限于电池续航能力，而无人车在障碍物和地形起伏条件下存在可达性挑战。现有研究主要集中于一对一协同模式，这限制了目标抵达效率。本文提出一种支持可变数量车辆的无人机-无人车协同新方法，采用改进的均值漂移聚类算法将目标区域划分为多个子区域。每台车辆分别运用多智能体深度确定性策略梯度（MADDPG）和多智能体近端策略优化（MAPPO）两种先进强化学习算法，形成能够无碰撞穿越障碍环境的有效协同机制。这种基于密度和距离将目标分配至不同圆形区域的方法，显著缩短了抵达目标所需时间。此外，通过引入无人机与无人车数量的动态变化，实现多目标同步交互，进一步提升了任务效能。实验评估结果表明，本方法在目标导航时间和任务完成率两个指标上均大幅超越现有最优技术，效率提升近一倍。