Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers

Inverted landing in a rapid and robust manner is a challenging feat for aerial robots, especially while depending entirely on onboard sensing and computation. In spite of this, this feat is routinely performed by biological fliers such as bats, flies, and bees. Our previous work has identified a direct causal connection between a series of onboard visual cues and kinematic actions that allow for reliable execution of this challenging aerobatic maneuver in small aerial robots. In this work, we first utilized Deep Reinforcement Learning and a physics-based simulation to obtain a general, optimal control policy for robust inverted landing starting from any arbitrary approach condition. This optimized control policy provides a computationally-efficient mapping from the system's observational space to its motor command action space, including both triggering and control of rotational maneuvers. This was done by training the system over a large range of approach flight velocities that varied with magnitude and direction. Next, we performed a sim-to-real transfer and experimental validation of the learned policy via domain randomization, by varying the robot's inertial parameters in the simulation. Through experimental trials, we identified several dominant factors which greatly improved landing robustness and the primary mechanisms that determined inverted landing success. We expect the learning framework developed in this study can be generalized to solve more challenging tasks, such as utilizing noisy onboard sensory data, landing on surfaces of various orientations, or landing on dynamically-moving surfaces.

翻译：倒置着陆以快速而稳健的方式完成对空中机器人是一项具有挑战性的壮举，尤其是在完全依赖机载感知和计算的情况下。尽管如此，蝙蝠、苍蝇和蜜蜂等生物飞行者却经常执行这一动作。我们先前的研究已发现一系列机载视觉线索与运动动作之间存在直接因果联系，这使得小型空中机器人能够可靠执行这一高难度特技机动。在本工作中，我们首先利用深度强化学习和基于物理的模拟，获得了一种通用的最优控制策略，用于从任意接近条件开始实现稳健的倒置着陆。该优化控制策略提供了从系统观测空间到其电机命令动作空间的计算高效映射，包括旋转机动的触发与控制。这是通过训练系统覆盖幅度和方向均变化的大范围接近飞行速度实现的。接下来，通过领域随机化——在仿真中改变机器人的惯性参数，我们进行了从仿真到现实的迁移及所学策略的实验验证。通过实验测试，我们确定了几个显著提升着陆稳健性的主导因素，以及决定倒置着陆成功的主要机制。我们预期本研究开发的学习框架能够推广用于解决更具挑战性的任务，例如利用带噪声的机载传感器数据、在不同朝向的表面上着陆，或在动态移动表面上着陆。