Attitude control of fixed-wing unmanned aerial vehicles (UAVs) is a difficult control problem in part due to uncertain nonlinear dynamics, actuator constraints, and coupled longitudinal and lateral motions. Current state-of-the-art autopilots are based on linear control and are thus limited in their effectiveness and performance. Deep reinforcement learning (DRL) is a machine learning method to automatically discover optimal control laws through interaction with the controlled system, which can handle complex nonlinear dynamics. We show in this paper that DRL can successfully learn to perform attitude control of a fixed-wing UAV operating directly on the original nonlinear dynamics, requiring as little as three minutes of flight data. We initially train our model in a simulation environment and then deploy the learned controller on the UAV in flight tests, demonstrating comparable performance to the state-of-the-art ArduPlane proportional-integral-derivative (PID) attitude controller with no further online learning required. Learning with significant actuation delay and diversified simulated dynamics were found to be crucial for successful transfer to control of the real UAV. In addition to a qualitative comparison with the ArduPlane autopilot, we present a quantitative assessment based on linear analysis to better understand the learning controller's behavior.
翻译:固定翼无人机的姿态控制是一个困难的控制问题,部分原因在于不确定的非线性动力学、执行器约束以及纵向与横侧向运动的耦合。当前最先进的自动驾驶仪基于线性控制,因此其有效性和性能受到限制。深度强化学习是一种机器学习方法,通过与受控系统的交互自动发现最优控制律,能够处理复杂的非线性动力学。本文表明,深度强化学习能够成功学习直接作用于原始非线性动力学的固定翼无人机姿态控制,所需飞行数据仅需三分钟。我们首先在仿真环境中训练模型,然后将所学控制器部署于无人机飞行测试中,其性能与最先进的ArduPlane比例-积分-微分姿态控制器相当,且无需进一步在线学习。研究发现,结合显著执行延迟和多样化的仿真动力学进行学习,对于成功迁移至真实无人机控制至关重要。除与ArduPlane自动驾驶仪进行定性比较外,我们还基于线性分析进行定量评估,以更深入地理解学习控制器的行为。