This paper proposes a novel Reinforcement Learning (RL) approach for sim-to-real policy transfer of Vertical Take-Off and Landing Unmanned Aerial Vehicle (VTOL-UAV). The proposed approach is designed for VTOL-UAV landing on offshore docking stations in maritime operations. VTOL-UAVs in maritime operations encounter limitations in their operational range, primarily stemming from constraints imposed by their battery capacity. The concept of autonomous landing on a charging platform presents an intriguing prospect for mitigating these limitations by facilitating battery charging and data transfer. However, current Deep Reinforcement Learning (DRL) methods exhibit drawbacks, including lengthy training times, and modest success rates. In this paper, we tackle these concerns comprehensively by decomposing the landing procedure into a sequence of more manageable but analogous tasks in terms of an approach phase and a landing phase. The proposed architecture utilizes a model-based control scheme for the approach phase, where the VTOL-UAV is approaching the offshore docking station. In the Landing phase, DRL agents were trained offline to learn the optimal policy to dock on the offshore station. The Joint North Sea Wave Project (JONSWAP) spectrum model has been employed to create a wave model for each episode, enhancing policy generalization for sim2real transfer. A set of DRL algorithms have been tested through numerical simulations including value-based agents and policy-based agents such as Deep \textit{Q} Networks (DQN) and Proximal Policy Optimization (PPO) respectively. The numerical experiments show that the PPO agent can learn complicated and efficient policies to land in uncertain environments, which in turn enhances the likelihood of successful sim-to-real transfer.
翻译:本文提出了一种新颖的强化学习方法,用于垂直起降无人机的仿真到现实策略迁移。所提方法专为海上作业中VTOL-UAV在海上对接站降落而设计。海上作业中的VTOL-UAV受限于电池容量,其作业范围存在明显局限。通过在充电平台上实现自主降落,可为电池充电与数据传输提供便利,从而为缓解这些限制提供了诱人前景。然而,现有的深度强化学习方法存在训练时间长、成功率有限等不足。本文通过将降落过程分解为进场阶段与着陆阶段这一系列更易处理且性质相似的任务,全面应对上述问题。所提架构在VTOL-UAV接近海上对接站的进场阶段采用基于模型的控制策略;在着陆阶段,通过离线训练深度强化学习智能体以学习在海上站对接的最优策略。采用联合北海波浪项目谱模型为每个训练周期生成波浪模型,从而增强了仿真到现实迁移的策略泛化能力。通过数值仿真测试了一系列深度强化学习算法,包括基于值的智能体与基于策略的智能体,例如深度Q网络与近端策略优化。数值实验表明,PPO智能体能够学习在不确定环境中实现降落的复杂高效策略,进而提升了仿真到现实迁移的成功可能性。