Autonomous Unmanned Aerial Vehicle (UAV) swarms are increasingly used as rapidly deployable aerial relays and sensing platforms, yet practical deployments must operate under partial observability and intermittent peer-to-peer links. We present a graph-based multi-agent reinforcement learning framework trained under centralized training with decentralized execution (CTDE): a centralized critic and global state are available only during training, while each UAV executes a shared policy using local observations and messages from nearby neighbors. Our architecture encodes local agent state and nearby entities with an agent-entity attention module, and aggregates inter-UAV messages with neighbor self-attention over a distance-limited communication graph. We evaluate primarily on a cooperative relay deployment task (DroneConnect) and secondarily on an adversarial engagement task (DroneCombat). In DroneConnect, the proposed method achieves high coverage under restricted communication and partial observation (e.g. 74% coverage with M = 5 UAVs and N = 10 nodes) while remaining competitive with a mixed-integer linear programming (MILP) optimization-based offline upper bound, and it generalizes to unseen team sizes without fine-tuning. In the adversarial setting, the same framework transfers without architectural changes and improves win rate over non-communicating baselines.
翻译:自主无人机群正日益被用作快速部署的空中中继与感知平台,然而实际部署必须在部分可观测性与间歇性点对点链路的条件下运行。我们提出了一种基于图的多智能体强化学习框架,采用集中训练与分散执行(CTDE)进行训练:集中式评论器与全局状态仅在训练期间可用,而每架无人机使用局部观测与来自邻近邻居的消息执行共享策略。我们的架构通过智能体-实体注意力模块编码局部智能体状态与邻近实体,并在距离受限的通信图上通过邻居自注意力聚合无人机间消息。我们主要在协同中继部署任务(DroneConnect)上进行评估,并在对抗性交战任务(DroneCombat)上进行次要评估。在DroneConnect中,所提方法在受限通信与部分观测条件下实现了高覆盖率(例如,当M = 5架无人机、N = 10个节点时覆盖率达74%),同时与基于混合整数线性规划(MILP)优化的离线上界保持竞争力,并且无需微调即可泛化至未见过的团队规模。在对抗性场景中,同一框架无需架构修改即可迁移,并相比无通信基线提高了胜率。