Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning

In recent years, advances in underwater networking and multi-agent reinforcement learning (MARL) have significantly expanded multi-autonomous underwater vehicle (AUV) applications in marine exploration and target tracking. However, current MARL-driven cooperative tracking faces three critical challenges: 1) non-stationarity in decentralized coordination, where local policy updates destabilize teammates' observation spaces, preventing convergence; 2) sparse-reward exploration inefficiency from limited underwater visibility and constrained sensor ranges, causing high-variance learning; and 3) water disturbance fragility combined with handcrafted reward dependency that degrades real-world robustness under unmodeled hydrodynamic conditions. To address these challenges, this paper proposes a hierarchical MARL architecture comprising four layers: global training scheduling, multi-agent coordination, local decision-making, and real-time execution. This architecture optimizes task allocation and inter-AUV coordination through hierarchical decomposition. Building on this foundation, we propose the Supervised Diffusion-Aided MARL (SDA-MARL) algorithm featuring three innovations: 1) a dual-decision architecture with segregated experience pools mitigating nonstationarity through structured experience replay; 2) a supervised learning mechanism guiding the diffusion model's reverse denoising process to generate high-fidelity training samples that accelerate convergence; and 3) disturbance-robust policy learning incorporating behavioral cloning loss to guide the Deep Deterministic Policy Gradient network update using high-quality replay actions, eliminating handcrafted reward dependency. The tracking algorithm based on SDA-MARL proposed in this paper achieves superior precision compared to state-of-the-art methods in comprehensive underwater simulations.

翻译：近年来，水下组网和多智能体强化学习（MARL）的进展显著扩展了多自主水下航行器（AUV）在海洋探测与目标跟踪中的应用。然而，当前基于MARL驱动的协同跟踪面临三个关键挑战：1）分散协调中的非平稳性，即局部策略更新会破坏队友的观测空间稳定性，阻碍收敛；2）水下有限可见度和受限传感器范围导致的稀疏奖励探索效率低下，引发高方差学习；3）水流扰动脆弱性及手工设计奖励依赖，在未建模水动力条件下会降低实际部署的鲁棒性。为解决这些问题，本文提出一种包含四层结构的分层MARL架构：全局训练调度层、多智能体协调层、本地决策层和实时执行层。该架构通过层次化分解优化任务分配与AUV间协同。在此基础上，我们提出监督扩散辅助多智能体强化学习（SDA-MARL）算法，包含三项创新：1）双决策架构与分离经验池，通过结构化经验回放缓解非平稳性；2）监督学习机制引导扩散模型反向去噪过程，生成高保真训练样本加速收敛；3）融入行为克隆损失的扰动鲁棒策略学习，利用高质量回放动作指导深度确定性策略梯度网络更新，消除手工奖励依赖。基于SDA-MARL的跟踪算法在综合水下仿真中实现了优于现有方法的跟踪精度。