Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.
翻译:让双足步行机器人学会在高度不平、动态变化的地形上机动极具挑战性,这源于机器人动力学与交互环境的复杂性。近年来,从示范中学习的进展在复杂环境下的机器人学习方面展现出巨大潜力。尽管专家策略的模仿学习已得到充分探索,但在腿式运动中对专家奖励函数的学习研究仍相对匮乏。本文将最先进的逆向强化学习技术应用于解决复杂地形上的双足步行问题。我们提出了学习专家奖励函数的算法,并随后分析了学习所得函数。通过非线性函数逼近,我们揭示了专家运动策略中富有意义的见解。此外,实验证明,基于推断的奖励函数训练双足步行策略能提升其在未见地形上的行走性能,凸显了奖励学习所带来的适应性。