Deep Reinforcement Learning (DRL) offers a robust alternative to traditional control methods for autonomous underwater docking, particularly in adapting to unpredictable environmental conditions. However, bridging the "sim-to-real" gap and managing high training latencies remain significant bottlenecks for practical deployment. This paper presents a systematic approach for autonomous docking using the Girona Autonomous Underwater Vehicle (AUV) by leveraging a high-fidelity digital twin environment. We adapted the Stonefish simulator into a multiprocessing RL framework to significantly accelerate the learning process while incorporating realistic AUV dynamics, collision models, and sensor noise. Using the Proximal Policy Optimization (PPO) algorithm, we developed a 6-DoF control policy trained in a headless environment with randomized starting positions to ensure generalized performance. Our reward structure accounts for distance, orientation, action smoothness, and adaptive collision penalties to facilitate soft docking. Experimental results demonstrate that the agent achieved a success rate of over 90% in simulation. Furthermore, successful validation in a physical test tank confirmed the efficacy of the sim-to-reality adaptation, with the DRL controller exhibiting emergent behaviors such as pitch-based braking and yaw oscillations to assist in mechanical alignment.
翻译:深度强化学习(DRL)为自主水下对接提供了一种比传统控制方法更鲁棒的替代方案,尤其是在适应不可预测的环境条件方面。然而,弥合"仿真到现实"的差距以及管理高训练延迟仍然是实际部署的重大瓶颈。本文提出了一种利用高保真数字孪生环境,使用Girona自主水下航行器(AUV)进行自主对接的系统性方法。我们将Stonefish仿真器适配到一个多进程强化学习框架中,以显著加速学习过程,同时融入了真实的AUV动力学、碰撞模型和传感器噪声。使用近端策略优化(PPO)算法,我们开发了一个在无头环境中训练的六自由度控制策略,并采用随机起始位置以确保泛化性能。我们的奖励结构考虑了距离、朝向、动作平滑度以及自适应碰撞惩罚,以促进软对接。实验结果表明,智能体在仿真中实现了超过90%的成功率。此外,在物理测试水池中的成功验证证实了仿真到现实适应的有效性,DRL控制器表现出了一些涌现行为,例如基于俯仰的制动和偏航振荡,以协助机械对准。