We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
翻译:本研究探讨了利用多台自主水下航行器(AUV)对河流羽流进行长期(多日)测绘的问题,重点关注杜罗河这一典型应用场景。我们提出了一种能源高效且通信高效的多智能体强化学习方法,其中中央协调器间歇性地与AUV进行通信,收集测量数据并下达指令。该方法将时空高斯过程回归(GPR)与一个多头部Q网络控制器相结合,该控制器可调节每台AUV的方向和速度。使用Delft3D海洋模型进行的仿真表明,我们的方法在均方误差(MSE)和运行续航能力方面均持续优于单智能体与多智能体基准方法,且增加智能体数量能同时改善这两项指标。在某些情况下,我们的算法表明,在保持或提升精度的同时,将AUV数量翻倍可使续航能力提升一倍以上,这凸显了多智能体协同的优势。我们学习到的策略能够泛化至不同月份和年份中未见过的季节性水文状况,这为未来数据驱动的动态羽流环境长期监测研究展示了良好前景。