Decentralized cooperative pursuit in cluttered environments is challenging for autonomous aerial swarms, especially under partial and noisy perception. Existing methods often rely on abstracted geometric features or privileged ground-truth states, and therefore sidestep perceptual uncertainty in real-world settings. We propose a decentralized end-to-end multi-agent reinforcement learning (MARL) framework that maps raw LiDAR observations directly to continuous control commands. Central to the framework is the Predictive Spatio-Temporal Observation (PSTO), an egocentric grid representation that aligns obstacle geometry with predictive adversarial intent and teammate motion in a unified, fixed-resolution projection. Built on PSTO, a single decentralized policy enables agents to navigate static obstacles, intercept dynamic targets, and maintain cooperative encirclement. Simulations demonstrate that the proposed method achieves superior capture efficiency and competitive success rates compared to state-of-the-art learning-based approaches relying on privileged obstacle information. Furthermore, the unified policy scales seamlessly across different team sizes without retraining. Finally, fully autonomous outdoor experiments validate the framework on a quadrotor swarm relying on only onboard sensing and computing.
翻译:在杂乱环境中实现去中心化协同追踪对自主飞行集群而言极具挑战性,尤其当传感器存在部分遮挡与噪声干扰时。现有方法通常依赖抽象化的几何特征或特权级真实状态信息,从而规避了实际环境中的感知不确定性。我们提出一种去中心化的端到端多智能体强化学习框架,该框架将原始激光雷达观测直接映射为连续控制指令。该框架的核心是预测性时空观测——一种以自我为中心的栅格表征方法,该方法以统一的固定分辨率投影将障碍物几何特征、预测性对抗意图与队友运动状态进行对齐整合。基于预测性时空观测构建的单一去中心化策略,使智能体能够同时实现静态障碍物规避、动态目标拦截以及协同围捕。仿真实验表明,与依赖特权级障碍物信息的最优现有学习方法相比,本方法在捕获效率与成功率方面均具有显著优势。此外,所提出的统一策略可在不同团队规模下无缝扩展而无需重新训练。最终,完全自主的户外实验验证了该框架在仅依赖机载感知与计算的四旋翼集群中的有效性。