Deep reinforcement learning (RL) can enable robots to autonomously acquire complex behaviors, such as legged locomotion. However, RL in the real world is complicated by constraints on efficiency, safety, and overall training stability, which limits its practical applicability. We present APRL, a policy regularization framework that modulates the robot's exploration over the course of training, striking a balance between flexible improvement potential and focused, efficient exploration. APRL enables a quadrupedal robot to efficiently learn to walk entirely in the real world within minutes and continue to improve with more training where prior work saturates in performance. We demonstrate that continued training with APRL results in a policy that is substantially more capable of navigating challenging situations and is able to adapt to changes in dynamics with continued training.
翻译:深度强化学习(RL)可使机器人自主获取复杂行为,例如腿部运动。然而,现实世界中的强化学习受制于效率、安全性和整体训练稳定性等方面的约束,限制了其实际应用价值。我们提出APRL框架——一种策略正则化方法,可在训练过程中动态调节机器人的探索行为,在灵活改进潜力与聚焦高效探索之间取得平衡。APRL使四足机器人能够在数分钟内完全在现实世界中高效学会行走,并且随着训练持续改善性能,而此前的工作在性能上趋于饱和。我们证明,采用APRL进行持续训练所获得的策略,在应对挑战性情境方面具有显著更强的能力,并且能够通过持续训练适应动力学变化。