基于多智能体强化学习的异构平台数据驱动分布式通用作战图生成 (Data-Driven Distributed Common Operational Picture from Heterogeneous Platforms using Multi-Agent Reinforcement Learning)

The integration of unmanned platforms equipped with advanced sensors promises to enhance situational awareness and mitigate the "fog of war" in military operations. However, managing the vast influx of data from these platforms poses a significant challenge for Command and Control (C2) systems. This study presents a novel multi-agent learning framework to address this challenge. Our method enables autonomous and secure communication between agents and humans, which in turn enables real-time formation of an interpretable Common Operational Picture (COP). Each agent encodes its perceptions and actions into compact vectors, which are then transmitted, received and decoded to form a COP encompassing the current state of all agents (friendly and enemy) on the battlefield. Using Deep Reinforcement Learning (DRL), we jointly train COP models and agent's action selection policies. We demonstrate resilience to degraded conditions such as denied GPS and disrupted communications. Experimental validation is performed in the Starcraft-2 simulation environment to evaluate the precision of the COPs and robustness of policies. We report less than 5% error in COPs and policies resilient to various adversarial conditions. In summary, our contributions include a method for autonomous COP formation, increased resilience through distributed prediction, and joint training of COP models and multi-agent RL policies. This research advances adaptive and resilient C2, facilitating effective control of heterogeneous unmanned platforms.

翻译：配备先进传感器的无人平台集成有望提升军事行动中的态势感知能力并缓解"战争迷雾"。然而，管理这些平台产生的海量数据对指挥控制（C2）系统构成了重大挑战。本研究提出了一种新颖的多智能体学习框架以应对这一挑战。我们的方法实现了智能体与人类之间自主且安全的通信，从而能够实时形成可解释的通用作战图（COP）。每个智能体将其感知与行动编码为紧凑向量，这些向量经过传输、接收和解码后形成涵盖战场所有智能体（友方与敌方）当前状态的COP。通过深度强化学习（DRL），我们联合训练COP模型与智能体行动选择策略。实验证明该方法在GPS拒止与通信中断等降级条件下仍具韧性。我们在星际争霸-2仿真环境中进行实验验证，以评估COP的精确度与策略的鲁棒性。实验结果显示COP误差低于5%，且策略能适应多种对抗条件。综上所述，本研究的贡献包括：自主COP生成方法、通过分布式预测提升系统韧性，以及COP模型与多智能体强化学习策略的联合训练。该研究推动了自适应与高韧性的指挥控制技术发展，为有效管控异构无人平台提供了支撑。