In swarm robotics, confrontation including the pursuit-evasion game is a key scenario. High uncertainty caused by unknown opponents' strategies and dynamic obstacles complicates the action space into a hybrid decision process. Although the deep reinforcement learning method is significant for swarm confrontation since it can handle various sizes, as an end-to-end implementation, it cannot deal with the hybrid process. Here, we propose a novel hierarchical reinforcement learning approach consisting of a target allocation layer, a path planning layer, and the underlying dynamic interaction mechanism between the two layers, which indicates the quantified uncertainty. It decouples the hybrid process into discrete allocation and continuous planning layers, with a probabilistic ensemble model to quantify the uncertainty and regulate the interaction frequency adaptively. Furthermore, to overcome the unstable training process introduced by the two layers, we design an integration training method including pre-training and cross-training, which enhances the training efficiency and stability. Experiment results in both comparison and ablation studies validate the effectiveness and generalization performance of our proposed approach.
翻译:在集群机器人学中,包含追逃博弈在内的对抗是关键场景。由未知对手策略和动态障碍物引起的高度不确定性,将动作空间复杂化为一个混合决策过程。尽管深度强化学习方法因其能处理不同规模问题而对集群对抗具有重要意义,但作为一种端到端的实现方式,它无法处理这种混合过程。本文提出了一种新颖的分层强化学习方法,该方法由目标分配层、路径规划层以及两层之间表示量化不确定性的底层动态交互机制构成。它将混合过程解耦为离散分配层和连续规划层,并采用概率集成模型来量化不确定性,从而自适应地调节交互频率。此外,为克服由两层结构引入的不稳定训练过程,我们设计了一种包含预训练和交叉训练的集成训练方法,以提高训练效率和稳定性。对比实验与消融实验的结果验证了所提方法的有效性和泛化性能。