Reinforcement Learning based Autonomous Multi-Rotor Landing on Moving Platforms

Multi-rotor UAVs suffer from a restricted range and flight duration due to limited battery capacity. Autonomous landing on a 2D moving platform offers the possibility to replenish batteries and offload data, thus increasing the utility of the vehicle. Classical approaches rely on accurate, complex and difficult-to-derive models of the vehicle and the environment. Reinforcement learning (RL) provides an attractive alternative due to its ability to learn a suitable control policy exclusively from data during a training procedure. However, current methods require several hours to train, have limited success rates and depend on hyperparameters that need to be tuned by trial-and-error. We address all these issues in this work. First, we decompose the landing procedure into a sequence of simpler, but similar learning tasks. This is enabled by applying two instances of the same RL based controller trained for 1D motion for controlling the multi-rotor's movement in both the longitudinal and the lateral direction. Second, we introduce a powerful state space discretization technique that is based on i) kinematic modeling of the moving platform to derive information about the state space topology and ii) structuring the training as a sequential curriculum using transfer learning. Third, we leverage the kinematics model of the moving platform to also derive interpretable hyperparameters for the training process that ensure sufficient maneuverability of the multi-rotor vehicle. The training is performed using the tabular RL method Double Q-Learning. Through extensive simulations we show that the presented method significantly increases the rate of successful landings, while requiring less training time compared to other deep RL approaches. Finally, we deploy and demonstrate our algorithm on real hardware. For all evaluation scenarios we provide statistics on the agent's performance.

翻译：多旋翼无人机因电池容量受限，存在航程和飞行时间不足的问题。在二维移动平台上自主降落为其提供电池充电与数据卸载的可能性，从而提升无人机实用性。传统方法依赖于精确、复杂且难以推导的无人机与环境模型。强化学习（RL）因其能够在训练过程中仅从数据中学习合适的控制策略而成为一种有吸引力的替代方案。然而，现有方法需要数小时训练、成功率有限，且依赖需通过试错调整的超参数。本文针对这些问题展开研究。首先，我们将降落过程分解为一系列更简单但相似的子学习任务，通过应用同一基于强化学习的控制器在两个实例中分别控制无人机纵向与横向的一维运动来实现。其次，我们提出一种强大的状态空间离散化技术：基于移动平台运动学模型推导状态空间拓扑结构，并利用迁移学习将训练组织为序列课程。第三，我们利用移动平台运动学模型推导出具有可解释性的训练超参数，确保多旋翼无人机的充分机动性。训练采用表格强化学习方法Double Q-Learning。通过大量仿真实验证明，与其他深度强化学习方法相比，本方法在显著缩短训练时间的同时，有效提高了成功着陆率。最后，我们在实际硬件上部署并验证了该算法。对所有评估场景均提供了智能体性能的统计结果。