The application of artificial intelligence to simulate air-to-air combat scenarios is attracting increasing attention. To date the high-dimensional state and action spaces, the high complexity of situation information (such as imperfect and filtered information, stochasticity, incomplete knowledge about mission targets) and the nonlinear flight dynamics pose significant challenges for accurate air combat decision-making. These challenges are exacerbated when multiple heterogeneous agents are involved. We propose a hierarchical multi-agent reinforcement learning framework for air-to-air combat with multiple heterogeneous agents. In our framework, the decision-making process is divided into two stages of abstraction, where heterogeneous low-level policies control the action of individual units, and a high-level commander policy issues macro commands given the overall mission targets. Low-level policies are trained for accurate unit combat control. Their training is organized in a learning curriculum with increasingly complex training scenarios and league-based self-play. The commander policy is trained on mission targets given pre-trained low-level policies. The empirical validation advocates the advantages of our design choices.
翻译:将人工智能应用于模拟空对空作战场景正受到越来越多的关注。迄今为止,高维状态与动作空间、态势信息的高度复杂性(如不完美及过滤信息、随机性、对任务目标的不完整认知)以及非线性飞行动力学,为精确的空战决策带来了重大挑战。当涉及多个异构智能体时,这些挑战进一步加剧。我们提出了一种面向多异构智能体空对空作战的分层多智能体强化学习框架。在该框架中,决策过程被划分为两个抽象层次:底层异构策略控制单个单元的动作,而高层指挥官策略则根据总体任务目标发布宏观指令。底层策略针对精确的单元作战控制进行训练,其训练过程通过课程学习组织,包含复杂度递增的训练场景及基于联盟的自我对弈。指挥官策略则在给定预训练底层策略的条件下,基于任务目标进行训练。实证结果验证了我们设计选择的优势。