Autonomous Underwater Vehicles (AUVs) require reliable six-degree-of-freedom (6-DOF) position control to operate effectively in complex and dynamic marine environments. Traditional controllers are effective under nominal conditions but exhibit degraded performance when faced with unmodeled dynamics or environmental disturbances. Reinforcement learning (RL) provides a powerful alternative but training is typically slow and sim-to-real transfer remains challenging. This work introduces a GPU accelerated RL training pipeline built in JAX and MuJoCo-XLA (MJX). By jointly JIT-compiling large-scale parallel physics simulation and learning updates, we achieve training times of under two minutes. Through systematic evaluation of multiple RL algorithms, we show robust 6-DOF trajectory tracking and effective disturbance rejection in real underwater experiments, with policies transferred zero-shot from simulation.
翻译:自主水下航行器(AUVs)需具备可靠的六自由度(6-DOF)位置控制能力,方能在复杂动态的海洋环境中有效作业。传统控制器在标称条件下表现良好,但在面对未建模动力学或环境扰动时性能显著下降。强化学习(RL)提供了一种强有力的替代方案,但其训练通常耗时较长,且仿真到实体的迁移仍具挑战性。本研究提出了一种基于JAX与MuJoCo-XLA(MJX)构建的GPU加速RL训练流程。通过将大规模并行物理仿真与学习更新进行联合即时编译,我们实现了两分钟以内的训练时间。通过对多种RL算法的系统评估,我们在真实水下实验中展示了稳健的六自由度轨迹跟踪与有效的扰动抑制能力,且策略实现了从仿真到实体的零样本迁移。