Reinforcement Learning (RL) allows learning non-trivial robot control laws purely from data. However, many successful applications of RL have relied on ad-hoc regularizations, such as hand-crafted curricula, to regularize the learning performance. In this paper, we pair a recent algorithm for automatically building curricula with RL on massively parallelized simulations to learn a tracking controller for a spherical pendulum on a robotic arm via RL. Through an improved optimization scheme that better respects the non-Euclidean task structure, we allow the method to reliably generate curricula of trajectories to be tracked, resulting in faster and more robust learning compared to an RL baseline that does not exploit this form of structured learning. The learned policy matches the performance of an optimal control baseline on the real system, demonstrating the potential of curriculum RL to jointly learn state estimation and control for non-linear tracking tasks.
翻译:强化学习(RL)能够仅从数据中学习非平凡的机器人控制律。然而,许多成功的RL应用依赖于特定形式的正则化,例如手工设计的课程来规范学习性能。在本文中,我们将一种自动构建课程的算法与基于大规模并行仿真的RL相结合,为机械臂上的球摆系统学习追踪控制器。通过一种改进的优化方案——该方案更好地尊重了非欧几里得任务结构——我们使得该方法能够可靠地生成待追踪轨迹的课程,相较于未利用这种结构化学习的RL基线,实现了更快且更稳健的学习效果。学习得到的策略在真实系统上达到了最优控制基线的性能,展现了课程强化学习在非线性追踪任务中联合学习状态估计与控制的潜力。