This paper introduces a decentralized multi-agent reinforcement learning framework enabling structurally heterogeneous teams of agents to jointly discover and acquire randomly located targets in environments characterized by partial observability, communication constraints, and dynamic interactions. Each agent's policy is trained with the Multi-Agent Proximal Policy Optimization algorithm and employs a Graph Attention Network encoder that integrates simulated range-sensing data with communication embeddings exchanged among neighboring agents, enabling context-aware decision-making from both local sensing and relational information. In particular, this work introduces a unified framework that integrates graph-based communication and trajectory-aware safety through safety filters. The architecture is supported by a structured reward formulation designed to encourage effective target discovery and acquisition, collision avoidance, and de-correlation between the agents' communication vectors by promoting informational orthogonality. The effectiveness of the proposed reward function is demonstrated through a comprehensive ablation study. Moreover, simulation results demonstrate safe and stable task execution, confirming the framework's effectiveness.
翻译:本文提出一种去中心化多智能体强化学习框架,使结构异构的智能体团队能够在具有部分可观测性、通信约束和动态交互特性的环境中协同发现并获取随机分布的目标。每个智能体的策略采用多智能体近端策略优化算法进行训练,并利用图注意力网络编码器整合模拟距离传感数据与相邻智能体间交换的通信嵌入,从而基于局部传感信息和关系信息实现情境感知决策。特别地,本工作通过安全滤波器引入了一个统一框架,该框架集成了基于图的通信机制和轨迹感知安全策略。该架构采用结构化奖励设计,旨在促进有效的目标发现与获取、碰撞规避以及通过增强信息正交性实现智能体通信向量间的解耦。通过系统的消融研究验证了所提奖励函数的有效性。此外,仿真结果展示了安全稳定的任务执行性能,证实了该框架的有效性。