Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations

This paper proposes a novel Reinforcement Learning (RL) approach for sim-to-real policy transfer of Vertical Take-Off and Landing Unmanned Aerial Vehicle (VTOL-UAV). The proposed approach is designed for VTOL-UAV landing on offshore docking stations in maritime operations. VTOL-UAVs in maritime operations encounter limitations in their operational range, primarily stemming from constraints imposed by their battery capacity. The concept of autonomous landing on a charging platform presents an intriguing prospect for mitigating these limitations by facilitating battery charging and data transfer. However, current Deep Reinforcement Learning (DRL) methods exhibit drawbacks, including lengthy training times, and modest success rates. In this paper, we tackle these concerns comprehensively by decomposing the landing procedure into a sequence of more manageable but analogous tasks in terms of an approach phase and a landing phase. The proposed architecture utilizes a model-based control scheme for the approach phase, where the VTOL-UAV is approaching the offshore docking station. In the Landing phase, DRL agents were trained offline to learn the optimal policy to dock on the offshore station. The Joint North Sea Wave Project (JONSWAP) spectrum model has been employed to create a wave model for each episode, enhancing policy generalization for sim2real transfer. A set of DRL algorithms have been tested through numerical simulations including value-based agents and policy-based agents such as Deep \textit{Q} Networks (DQN) and Proximal Policy Optimization (PPO) respectively. The numerical experiments show that the PPO agent can learn complicated and efficient policies to land in uncertain environments, which in turn enhances the likelihood of successful sim-to-real transfer.

翻译：本文提出了一种新颖的强化学习方法，用于垂直起降无人机的仿真到现实策略迁移。所提方法专为海上作业中VTOL-UAV在海上对接站降落而设计。海上作业中的VTOL-UAV因其电池容量限制，其作业范围存在局限。通过在充电平台上实现自主降落，为电池充电和数据传输提供便利，从而缓解这些限制，这一概念展现出诱人前景。然而，当前的深度强化学习方法存在训练时间长、成功率不高等缺点。本文通过将降落过程分解为一系列更易处理但任务相似的阶段——即接近阶段与降落阶段——来全面应对这些问题。所提架构在VTOL-UAV接近海上对接站的接近阶段采用基于模型的控制方案。在降落阶段，深度强化学习智能体通过离线训练学习在海上站对接的最优策略。采用联合北海波浪项目谱模型为每个训练回合生成波浪模型，以增强策略泛化能力，促进仿真到现实的迁移。通过数值仿真测试了一系列深度强化学习算法，包括基于值的智能体（如深度Q网络）和基于策略的智能体（如近端策略优化）。数值实验表明，PPO智能体能够学习在不确定环境中实现复杂且高效的降落策略，从而提高了仿真到现实迁移的成功可能性。