Developing optimal controllers for aggressive high-speed quadcopter flight is a major challenge in the field of robotics. Recent work has shown that neural networks trained with supervised learning can achieve real-time optimal control in some specific scenarios. In these methods, the networks (termed G&CNets) are trained to learn the optimal state feedback from a dataset of optimal trajectories. An important problem with these methods is the reality gap encountered in the sim-to-real transfer. In this work, we trained G&CNets for energy-optimal end-to-end control on the Bebop drone and identified the unmodeled pitch moment as the main contributor to the reality gap. To mitigate this, we propose an adaptive control strategy that works by learning from optimal trajectories of a system affected by constant external pitch, roll and yaw moments. In real test flights, this model mismatch is estimated onboard and fed to the network to obtain the optimal rpm command. We demonstrate the effectiveness of our method by performing energy-optimal hover-to-hover flights with and without moment feedback. Finally, we compare the adaptive controller to a state-of-the-art differential-flatness-based controller in a consecutive waypoint flight and demonstrate the advantages of our method in terms of energy optimality and robustness.
翻译:为高速激进飞行场景开发最优控制器是机器人领域的一项重大挑战。近期研究表明,通过监督学习训练的神经网络可在特定场景下实现实时最优控制。这些方法中,网络(称为G&CNets)被训练从最优轨迹数据集中学习最优状态反馈。这类方法的关键问题在于仿真到现实迁移中存在的现实差距。本研究针对Bebop无人机训练了能量最优端到端控制G&CNets,并识别出未建模的俯仰力矩是导致现实差距的主要因素。为缓解该问题,我们提出一种自适应控制策略,该策略通过受恒定外部俯仰、横滚和偏航力矩影响的系统的最优轨迹进行学习。在实际试飞中,该模型失配量通过机载估计后输入网络以获取最优转速指令。我们通过执行有无力矩反馈的能量最优悬停间飞行验证了该方法有效性。最后,在连续航点飞行任务中将自适应控制器与基于微分平坦性的先进控制器进行对比,证明了本方法在能量最优性和鲁棒性方面的优势。