Developing optimal controllers for aggressive high-speed quadcopter flight poses significant challenges in robotics. Recent trends in the field involve utilizing neural network controllers trained through supervised or reinforcement learning. However, the sim-to-real transfer introduces a reality gap, requiring the use of robust inner loop controllers during real flights, which limits the network's control authority and flight performance. In this paper, we investigate for the first time, an end-to-end neural network controller, addressing the reality gap issue without being restricted by an inner-loop controller. The networks, referred to as G\&CNets, are trained to learn an energy-optimal policy mapping the quadcopter's state to rpm commands using an optimal trajectory dataset. In hover-to-hover flights, we identified the unmodeled moments as a significant contributor to the reality gap. To mitigate this, we propose an adaptive control strategy that works by learning from optimal trajectories of a system affected by constant external pitch, roll and yaw moments. In real test flights, this model mismatch is estimated onboard and fed to the network to obtain the optimal rpm command. We demonstrate the effectiveness of our method by performing energy-optimal hover-to-hover flights with and without moment feedback. Finally, we compare the adaptive controller to a state-of-the-art differential-flatness-based controller in a consecutive waypoint flight and demonstrate the advantages of our method in terms of energy optimality and robustness.
翻译:针对高速激进四旋翼飞行中的最优控制器开发是机器人领域的重要挑战。当前研究趋势倾向于采用通过监督学习或强化学习训练的神经网络控制器。然而,仿真到现实的迁移会引入现实鸿沟,导致实际飞行中必须使用鲁棒内环控制器,这限制了神经网络的控制权与飞行性能。本文首次研究了一种端到端神经网络控制器,在不受内环控制器约束的条件下解决现实鸿沟问题。该网络名为G&CNets,通过最优轨迹数据集训练学习能量最优策略,将四旋翼状态映射为转速指令。在悬停-悬停飞行中,我们识别出未建模力矩是造成现实鸿沟的重要因素。为缓解此问题,我们提出一种自适应控制策略,通过学习受恒定外部俯仰、横滚、偏航力矩影响的系统最优轨迹实现。实际飞行测试中,该模型失配量通过机载估算并输入网络以获得最优转速指令。我们通过有无力矩反馈的能量最优悬停-悬停飞行实验验证了方法的有效性。最后,在连续航点飞行中将自适应控制器与基于微分平坦的先进控制器进行对比,证明了本方法在能量最优性和鲁棒性方面的优势。