Shifting from traditional control strategies to Deep Reinforcement Learning (RL) for legged robots poses inherent challenges, especially when addressing real-world physical constraints during training. While high-fidelity simulations provide significant benefits, they often bypass these essential physical limitations. In this paper, we experiment with the Constrained Markov Decision Process (CMDP) framework instead of the conventional unconstrained RL for robotic applications. We perform a comparative study of different constrained policy optimization algorithms to identify suitable methods for practical implementation. Our robot experiments demonstrate the critical role of incorporating physical constraints, yielding successful sim-to-real transfers, and reducing operational errors on physical systems. The CMDP formulation streamlines the training process by separately handling constraints from rewards. Our findings underscore the potential of constrained RL for the effective development and deployment of learned controllers in robotics.
翻译:从传统控制策略转向深度强化学习在足式机器人领域应用时存在固有挑战,尤其在训练过程中需处理真实世界的物理约束。虽然高保真仿真能带来显著优势,但常会规避这些关键物理限制。本文采用约束马尔可夫决策过程(CMDP)框架替代传统无约束强化学习方法进行机器人应用研究。我们通过对比分析多种约束策略优化算法,甄别适用于实际部署的可行方案。机器人实验表明,融入物理约束具有关键作用,不仅能实现成功的仿真到现实迁移,还可降低物理系统的运行误差。CMDP范式通过将约束与奖励分开处理,简化了训练流程。研究结果揭示了约束强化学习在机器人学中有效开发与部署学习控制器的巨大潜力。