Although acrobatic flight control has been studied extensively, one key limitation of the existing methods is that they are usually restricted to specific maneuver tasks and cannot change flight pattern parameters online. In this work, we propose a target-and-command-oriented reinforcement learning (TACO) framework, which can handle different maneuver tasks in a unified way and allows online parameter changes. Additionally, we propose a spectral normalization method with input-output rescaling to enhance the policy's temporal and spatial smoothness, independence, and symmetry, thereby overcoming the sim-to-real gap. We validate the TACO approach through extensive simulation and real-world experiments, demonstrating its capability to achieve high-speed circular flights and continuous multi-flips.
翻译:尽管特技飞行控制已被广泛研究,现有方法的一个关键局限在于它们通常局限于特定机动任务,且无法在线调整飞行模式参数。本文提出了一种目标与指令导向的强化学习(TACO)框架,该框架能够以统一方式处理不同机动任务,并支持在线参数调整。此外,我们提出了一种带输入-输出重缩放的谱归一化方法,以增强策略的时空平滑性、独立性与对称性,从而克服仿真到现实的迁移差距。通过大量仿真与真实世界实验,我们验证了TACO方法在实现高速圆周飞行与连续多圈翻转方面的能力。