It is significant to employ multiple autonomous underwater vehicles (AUVs) to execute the underwater target tracking task collaboratively. However, it's pretty challenging to meet various prerequisites utilizing traditional control methods. Therefore, we propose an effective two-stage learning from demonstrations training framework, FISHER, to highlight the adaptability of reinforcement learning (RL) methods in the multi-AUV underwater target tracking task, while addressing its limitations such as extensive requirements for environmental interactions and the challenges in designing reward functions. The first stage utilizes imitation learning (IL) to realize policy improvement and generate offline datasets. To be specific, we introduce multi-agent discriminator-actor-critic based on improvements of the generative adversarial IL algorithm and multi-agent IL optimization objective derived from the Nash equilibrium condition. Then in the second stage, we develop multi-agent independent generalized decision transformer, which analyzes the latent representation to match the future states of high-quality samples rather than reward function, attaining further enhanced policies capable of handling various scenarios. Besides, we propose a simulation to simulation demonstration generation procedure to facilitate the generation of expert demonstrations in underwater environments, which capitalizes on traditional control methods and can easily accomplish the domain transfer to obtain demonstrations. Extensive simulation experiments from multiple scenarios showcase that FISHER possesses strong stability, multi-task performance and capability of generalization.
翻译:利用多台自主水下航行器(AUV)协同执行水下目标跟踪任务具有重要意义。然而,利用传统控制方法满足各种先决条件极具挑战性。因此,我们提出了一种有效的两阶段演示学习训练框架FISHER,以突显强化学习(RL)方法在多AUV水下目标跟踪任务中的适应性,同时解决其局限性,如对环境交互的广泛需求以及奖励函数设计困难。第一阶段利用模仿学习(IL)实现策略改进并生成离线数据集。具体而言,我们引入了基于生成对抗IL算法改进的多智能体判别器-执行器-评价器,以及从纳什均衡条件推导出的多智能体IL优化目标。随后在第二阶段,我们开发了多智能体独立广义决策Transformer,该模型通过分析潜在表征来匹配高质量样本的未来状态而非奖励函数,从而获得能够处理各种场景的进一步强化的策略。此外,我们提出了一种仿真到仿真的演示生成流程,以促进水下环境中专家演示的生成;该流程充分利用传统控制方法,并能轻松完成领域迁移以获取演示。多种场景下的广泛仿真实验表明,FISHER具备强大的稳定性、多任务性能及泛化能力。