In recent years, advances in underwater networking and multi-agent reinforcement learning (MARL) have significantly expanded multi-autonomous underwater vehicle (AUV) applications in marine exploration and target tracking. However, current MARL-driven cooperative tracking faces three critical challenges: 1) non-stationarity in decentralized coordination, where local policy updates destabilize teammates' observation spaces, preventing convergence; 2) sparse-reward exploration inefficiency from limited underwater visibility and constrained sensor ranges, causing high-variance learning; and 3) water disturbance fragility combined with handcrafted reward dependency that degrades real-world robustness under unmodeled hydrodynamic conditions. To address these challenges, this paper proposes a hierarchical MARL architecture comprising four layers: global training scheduling, multi-agent coordination, local decision-making, and real-time execution. This architecture optimizes task allocation and inter-AUV coordination through hierarchical decomposition. Building on this foundation, we propose the Supervised Diffusion-Aided MARL (SDA-MARL) algorithm featuring three innovations: 1) a dual-decision architecture with segregated experience pools mitigating nonstationarity through structured experience replay; 2) a supervised learning mechanism guiding the diffusion model's reverse denoising process to generate high-fidelity training samples that accelerate convergence; and 3) disturbance-robust policy learning incorporating behavioral cloning loss to guide the Deep Deterministic Policy Gradient network update using high-quality replay actions, eliminating handcrafted reward dependency. The tracking algorithm based on SDA-MARL proposed in this paper achieves superior precision compared to state-of-the-art methods in comprehensive underwater simulations.
翻译:近年来,水下组网和多智能体强化学习(MARL)的进展显著扩展了多自主水下航行器(AUV)在海洋探测与目标跟踪中的应用。然而,当前基于MARL驱动的协同跟踪面临三个关键挑战:1)分散协调中的非平稳性,即局部策略更新会破坏队友的观测空间稳定性,阻碍收敛;2)水下有限可见度和受限传感器范围导致的稀疏奖励探索效率低下,引发高方差学习;3)水流扰动脆弱性及手工设计奖励依赖,在未建模水动力条件下会降低实际部署的鲁棒性。为解决这些问题,本文提出一种包含四层结构的分层MARL架构:全局训练调度层、多智能体协调层、本地决策层和实时执行层。该架构通过层次化分解优化任务分配与AUV间协同。在此基础上,我们提出监督扩散辅助多智能体强化学习(SDA-MARL)算法,包含三项创新:1)双决策架构与分离经验池,通过结构化经验回放缓解非平稳性;2)监督学习机制引导扩散模型反向去噪过程,生成高保真训练样本加速收敛;3)融入行为克隆损失的扰动鲁棒策略学习,利用高质量回放动作指导深度确定性策略梯度网络更新,消除手工奖励依赖。基于SDA-MARL的跟踪算法在综合水下仿真中实现了优于现有方法的跟踪精度。