Reinforcement learning is an effective way to solve the decision-making problems. It is a meaningful and valuable direction to investigate autonomous air combat maneuver decision-making method based on reinforcement learning. However, when using reinforcement learning to solve the decision-making problems with sparse rewards, such as air combat maneuver decision-making, it costs too much time for training and the performance of the trained agent may not be satisfactory. In order to solve these problems, the method based on curriculum learning is proposed. First, three curricula of air combat maneuver decision-making are designed: angle curriculum, distance curriculum and hybrid curriculum. These courses are used to train air combat agents respectively, and compared with the original method without any curriculum. The training results show that angle curriculum can increase the speed and stability of training, and improve the performance of the agent; distance curriculum can increase the speed and stability of agent training; hybrid curriculum has a negative impact on training, because it makes the agent get stuck at local optimum. The simulation results show that after training, the agent can handle the situations where targets come from different directions, and the maneuver decision results are consistent with the characteristics of missile.
翻译:强化学习是解决决策问题的有效途径,研究基于强化学习的自主空战机动决策方法具有重要价值和意义。然而,当使用强化学习解决空战机动决策这类稀疏奖励的决策问题时,训练耗时过长,且训练后的智能体性能可能不尽人意。为解决这些问题,本文提出了基于课程学习的方法。首先设计了三种空战机动决策课程:角度课程、距离课程和混合课程。分别使用这些课程训练空战智能体,并与未使用任何课程的原方法进行对比。训练结果表明:角度课程能提升训练速度与稳定性,并改善智能体性能;距离课程能提高智能体训练速度与稳定性;混合课程对训练产生负面影响,因为它会导致智能体陷入局部最优。仿真结果显示,经过训练后,智能体能够应对来自不同方向的目标,其机动决策结果与导弹特征保持一致。