Control design for robotic systems is complex and often requires solving an optimization to follow a trajectory accurately. Online optimization approaches like Model Predictive Control (MPC) have been shown to achieve great tracking performance, but require high computing power. Conversely, learning-based offline optimization approaches, such as Reinforcement Learning (RL), allow fast and efficient execution on the robot but hardly match the accuracy of MPC in trajectory tracking tasks. In systems with limited compute, such as aerial vehicles, an accurate controller that is efficient at execution time is imperative. We propose an Analytic Policy Gradient (APG) method to tackle this problem. APG exploits the availability of differentiable simulators by training a controller offline with gradient descent on the tracking error. We address training instabilities that frequently occur with APG through curriculum learning and experiment on a widely used controls benchmark, the CartPole, and two common aerial robots, a quadrotor and a fixed-wing drone. Our proposed method outperforms both model-based and model-free RL methods in terms of tracking error. Concurrently, it achieves similar performance to MPC while requiring more than an order of magnitude less computation time. Our work provides insights into the potential of APG as a promising control method for robotics. To facilitate the exploration of APG, we open-source our code and make it available at https://github.com/lis-epfl/apg_trajectory_tracking.
翻译:机器人系统的控制设计复杂,通常需要求解优化问题以精确跟踪轨迹。模型预测控制(MPC)等在线优化方法虽能实现优异的跟踪性能,但需要极高的计算能力。相比之下,强化学习(RL)等基于学习的离线优化方法虽可快速高效地在机器人上执行,但在轨迹跟踪任务中难以达到MPC的精度。对于飞行器等算力受限系统,设计既能精确控制又能高效执行的控制器至关重要。我们提出解析策略梯度(APG)方法以解决此问题。APG利用可微仿真器的特性,通过离线梯度下降训练控制器以最小化跟踪误差。针对APG中常见的训练不稳定性问题,我们采用课程学习策略,并在广泛使用的控制基准CartPole以及两种典型飞行机器人(四旋翼和固定翼无人机)上进行实验。结果表明,本文方法在跟踪误差上优于基于模型和无模型的强化学习方法,同时能以比MPC低一个数量级以上的计算时间达到相近性能。本研究揭示了APG作为机器人控制新方法的潜力,并开源代码(https://github.com/lis-epfl/apg_trajectory_tracking)以促进进一步探索。