IMAGINE：基于Godot的智能多智能体室内网络化探索 (IMAGINE: Intelligent Multi-Agent Godot-based Indoor Networked Exploration)

The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication-aware and collaborative group of Unmanned Aerial Vehicles (UAVs) presents significant challenges in coordination, perception, and decentralized decision-making. This paper implements Multi-Agent Reinforcement Learning (MARL) to address these challenges in a 2D indoor environment, using high-fidelity game-engine simulations (Godot) and continuous action spaces. Policy training aims to achieve emergent collaborative behaviours and decision-making under uncertainty using Network-Distributed Partially Observable Markov Decision Processes (ND-POMDPs). Each UAV is equipped with a Light Detection and Ranging (LiDAR) sensor and can share data (sensor measurements and a local occupancy map) with neighbouring agents. Inter-agent communication constraints include limited range, bandwidth and latency. Extensive ablation studies evaluated MARL training paradigms, reward function, communication system, neural network (NN) architecture, memory mechanisms, and POMDP formulations. This work jointly addresses several key limitations in prior research, namely reliance on discrete actions, single-agent or centralized formulations, assumptions of a priori knowledge and permanent connectivity, inability to handle dynamic obstacles, short planning horizons and architectural complexity in Recurrent NNs/Transformers. Results show that the scalable training paradigm, combined with a simplified architecture, enables rapid autonomous exploration of an indoor area. The implementation of Curriculum-Learning (five increasingly complex levels) also enabled faster, more robust training. This combination of high-fidelity simulation, MARL formulation, and computational efficiency establishes a strong foundation for deploying learned cooperative strategies in physical robotic systems.

翻译：自主通信感知与协作的无人机群在未知且全球导航卫星系统受限环境中的探索，在协调、感知和分散决策方面面临重大挑战。本文采用多智能体强化学习，在二维室内环境中通过高保真游戏引擎仿真（Godot）和连续动作空间来解决这些挑战。策略训练旨在利用网络分布式部分可观测马尔可夫决策过程，实现不确定性下的涌现协作行为与决策。每架无人机配备激光雷达传感器，可与邻近智能体共享数据（传感器测量值和局部占据栅格地图）。智能体间通信约束包括有限范围、带宽和延迟。广泛的消融研究评估了多智能体强化学习训练范式、奖励函数、通信系统、神经网络架构、记忆机制以及部分可观测马尔可夫决策过程建模。本研究共同解决了先前研究中的若干关键局限，包括对离散动作的依赖、单智能体或集中式建模、先验知识与永久连通性假设、无法处理动态障碍物、短规划视野以及循环神经网络/Transformer的架构复杂性。结果表明，可扩展的训练范式结合简化架构，能够实现室内区域的快速自主探索。课程学习（五个渐进复杂层级）的实施也实现了更快速、更鲁棒的训练。这种高保真仿真、多智能体强化学习建模与计算效率的结合，为在物理机器人系统中部署习得的协作策略奠定了坚实基础。