Deep Reinforcement Learning (RL) has demonstrated impressive results in solving complex robotic tasks such as quadruped locomotion. Yet, current solvers fail to produce efficient policies respecting hard constraints. In this work, we advocate for integrating constraints into robot learning and present Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing from classical constrained RL formulations, we reformulate constraints through stochastic terminations during policy learning: any violation of a constraint triggers a probability of terminating potential future rewards the RL agent could attain. We propose an algorithmic approach to this formulation, by minimally modifying widely used off-the-shelf RL algorithms in robot learning (such as Proximal Policy Optimization). Our approach leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption. Through empirical evaluation on the real quadruped robot Solo crossing challenging obstacles, we demonstrate that CaT provides a compelling solution for incorporating constraints into RL frameworks. Videos and code are available at https://constraints-as-terminations.github.io.
翻译:深度强化学习在解决如四足运动等复杂机器人任务中已展现出令人瞩目的成果。然而,现有求解器无法生成满足硬约束的高效策略。本文提倡将约束整合进机器人学习,并提出约束作为中止条件(CaT),一种新颖的约束强化学习算法。与经典约束强化学习公式不同,我们通过策略学习中的随机化中止来重新定义约束:任何约束违反都将触发随机终止智能体可能获得的未来奖励。我们为此公式提供一种算法实现,通过对机器人学习中广泛使用的现成强化学习算法(如近端策略优化)进行最小化修改即可实现。该方法在无需引入过多复杂性和计算开销的情况下实现优异的约束遵循,从而降低广泛采用的障碍。通过真实四足机器人Solo跨越挑战性障碍物的实证评估,我们证明CaT为将约束整合进强化学习框架提供了令人信服的解决方案。视频与代码见https://constraints-as-terminations.github.io。