Quadcopters have been studied for decades thanks to their maneuverability and capability of operating in a variety of circumstances. However, quadcopters suffer from dynamical nonlinearity, actuator saturation, as well as sensor noise that make it challenging and time consuming to obtain accurate dynamic models and achieve satisfactory control performance. Fortunately, deep reinforcement learning came and has shown significant potential in system modelling and control of autonomous multirotor aerial vehicles, with recent advancements in deployment, performance enhancement, and generalization. In this paper, an end-to-end deep reinforcement learning-based controller for quadcopters is proposed that is secure for real-world implementation, data-efficient, and free of human gain adjustments. First, a novel actor-critic-based architecture is designed to map the robot states directly to the motor outputs. Then, a quadcopter dynamics-based simulator was devised to facilitate the training of the controller policy. Finally, the trained policy is deployed on a real Crazyflie nano quadrotor platform, without any additional fine-tuning process. Experimental results show that the quadcopter exhibits satisfactory performance as it tracks a given complicated trajectory, which demonstrates the effectiveness and feasibility of the proposed method and signifies its capability in filling the simulation-to-reality gap.
翻译:四旋翼飞行器因其卓越的机动性及在多种环境下的运行能力,已被研究数十年。然而,四旋翼存在动力学非线性、执行器饱和以及传感器噪声等问题,使得获取精确动力学模型并实现满意的控制性能既具挑战性又耗时费力。所幸,深度强化学习的出现为自主多旋翼飞行器的系统建模与控制展现出巨大潜力,近年来在部署、性能提升与泛化方面均取得显著进展。本文提出一种端到端的基于深度强化学习的四旋翼控制器,该控制器具备实际部署安全性、数据高效性且无需人工增益调整。首先,设计了一种新颖的基于演员-评论家(actor-critic)的架构,直接将机器人状态映射至电机输出。随后,构建了基于四旋翼动力学的仿真器以促进控制器策略的训练。最终,训练完成的策略被部署于真实的Crazyflie纳米四旋翼平台,且未经过任何额外的微调过程。实验结果表明,该四旋翼在跟踪给定复杂轨迹时表现出令人满意的性能,这验证了所提方法的有效性与可行性,并彰显了其在弥合仿真与现实差距方面的潜力。