This paper tackles decentralized continuous task allocation in heterogeneous multi-agent systems. We present a novel framework HIPPO-MAT that integrates graph neural networks (GNN) employing a GraphSAGE architecture to compute independent embeddings on each agent with an Independent Proximal Policy Optimization (IPPO) approach for multi-agent deep reinforcement learning. In our system, unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) share aggregated observation data via communication channels while independently processing these inputs to generate enriched state embeddings. This design enables dynamic, cost-optimal, conflict-aware task allocation in a 3D grid environment without the need for centralized coordination. A modified A* path planner is incorporated for efficient routing and collision avoidance. Simulation experiments demonstrate scalability with up to 30 agents and preliminary real-world validation on JetBot ROS AI Robots, each running its model on a Jetson Nano and communicating through an ESP-NOW protocol using ESP32-S3, which confirms the practical viability of the approach that incorporates simultaneous localization and mapping (SLAM). Experimental results revealed that our method achieves a high 92.5% conflict-free success rate, with only a 16.49% performance gap compared to the centralized Hungarian method, while outperforming the heuristic decentralized baseline based on greedy approach. Additionally, the framework exhibits scalability with up to 30 agents with allocation processing of 0.32 simulation step time and robustness in responding to dynamically generated tasks.
翻译:本文致力于解决异构多智能体系统中的去中心化连续任务分配问题。我们提出了一种新颖的框架HIPPO-MAT,该框架集成了采用GraphSAGE架构的图神经网络(GNN)与适用于多智能体深度强化学习的独立近端策略优化(IPPO)方法。在我们的系统中,无人机(UAV)与无人地面车辆(UGV)通过通信信道共享聚合的观测数据,同时独立处理这些输入以生成丰富的状态嵌入。这一设计使得在无需集中协调的三维网格环境中实现动态、成本最优且具备冲突感知的任务分配成为可能。系统还集成了一个改进的A*路径规划器,用于高效路由和碰撞避免。仿真实验证明了该方法在多达30个智能体规模下的可扩展性,并在JetBot ROS AI机器人上进行了初步的真实世界验证。每个机器人在Jetson Nano上运行其模型,并通过ESP32-S3使用ESP-NOW协议进行通信,这证实了该结合了同步定位与建图(SLAM)方法的实际可行性。实验结果表明,我们的方法实现了高达92.5%的无冲突成功率,与集中式匈牙利算法相比仅有16.49%的性能差距,同时优于基于贪婪方法的启发式去中心化基线。此外,该框架展现出良好的可扩展性(支持多达30个智能体,分配处理时间仅为0.32个仿真步长)以及对动态生成任务响应的鲁棒性。