A necessary capability for humanoid robots is the ability to stand and walk while rejecting natural disturbances. Recent progress has been made using sim-to-real reinforcement learning (RL) to train such locomotion controllers, with approaches differing mainly in their reward functions. However, prior works lack a clear method to systematically test new reward functions and compare controller performance through repeatable experiments. This limits our understanding of the trade-offs between approaches and hinders progress. To address this, we propose a low-cost, quantitative benchmarking method to evaluate and compare the real-world performance of standing and walking (SaW) controllers on metrics like command following, disturbance recovery, and energy efficiency. We also revisit reward function design and construct a minimally constraining reward function to train SaW controllers. We experimentally verify that our benchmarking framework can identify areas for improvement, which can be systematically addressed to enhance the policies. We also compare our new controller to state-of-the-art controllers on the Digit humanoid robot. The results provide clear quantitative trade-offs among the controllers and suggest directions for future improvements to the reward functions and expansion of the benchmarks.
翻译:仿人机器人必须具备的一项基本能力是在抵抗自然干扰的同时实现站立与行走。近期,利用仿真到现实强化学习训练此类运动控制器取得了进展,不同方法主要差异在于奖励函数的设计。然而,现有研究缺乏系统性测试新奖励函数并通过可重复实验比较控制器性能的明确方法,这限制了对不同方法间权衡关系的理解,阻碍了技术发展。为解决这一问题,我们提出了一种低成本、定量化的基准测试方法,用于评估和比较站立与行走控制器在命令跟踪、干扰恢复及能效等指标上的真实世界性能。同时,我们重新审视了奖励函数设计,构造了一种最小约束的奖励函数来训练站立与行走控制器。实验验证表明,我们的基准测试框架能够识别需要改进的方向,并通过系统性优化提升策略性能。此外,我们将新控制器与Digit仿人机器人上的现有最优控制器进行了比较。结果揭示了各控制器间清晰的定量权衡关系,并为未来奖励函数优化及基准扩展提供了方向。