Flight control for autonomous micro aerial vehicles (MAVs) is evolving from steady flight near equilibrium points toward more aggressive aerobatic maneuvers, such as flips, rolls, and Power Loop. Although reinforcement learning (RL) has shown great potential in these tasks, conventional RL methods often suffer from low data efficiency and limited generalization. This challenge becomes more pronounced in multi-task scenarios where a single policy is required to master multiple maneuvers. In this paper, we propose a novel end-to-end multi-task reinforcement learning framework, called GEAR (Geometric Equivariant Aerobatics Reinforcement), which fully exploits the inherent SO(2) rotational symmetry in MAV dynamics and explicitly incorporates this property into the policy network architecture. By integrating an equivariant actor network, FiLM-based task modulation, and a multi-head critic, GEAR achieves both efficiency and flexibility in learning diverse aerobatic maneuvers, enabling a data-efficient, robust, and unified framework for aerobatic control. GEAR attains a 98.85\% success rate across various aerobatic tasks, significantly outperforming baseline methods. In real-world experiments, GEAR demonstrates stable execution of multiple maneuvers and the capability to combine basic motion primitives to complete complex aerobatics.
翻译:自主微型飞行器(MAV)的飞行控制正从平衡点附近的稳态飞行向更具挑战性的特技机动(如翻转、横滚和Power Loop)演进。尽管强化学习(RL)在这些任务中展现出巨大潜力,但传统RL方法常面临数据效率低和泛化能力有限的问题。这一挑战在多任务场景中更为突出,因为单一策略需要掌握多种机动动作。本文提出了一种新颖的端到端多任务强化学习框架——GEAR(几何等变特技强化学习),该框架充分利用MAV动力学中固有的SO(2)旋转对称性,并将此特性显式融入策略网络架构。通过整合等变执行器网络、基于FiLM的任务调制模块以及多头评判器,GEAR在学习多样化特技机动时实现了效率与灵活性的统一,构建了一个数据高效、鲁棒且统一的特技控制框架。GEAR在各类特技任务中取得了98.85%的成功率,显著优于基线方法。在真实世界实验中,GEAR展示了稳定执行多种机动动作的能力,并能组合基本运动基元以完成复杂特技飞行。