Achieving stable and energy-efficient locomotion is essential for humanoid robots to operate continuously in real-world applications. Existing MPC and RL approaches often rely on energy-related metrics embedded within a multi-objective optimization framework, which require extensive hyperparameter tuning and often result in suboptimal policies. To address these challenges, we propose ECO (Energy-Constrained Optimization), a constrained RL framework that separates energy-related metrics from rewards, reformulating them as explicit inequality constraints. This method provides a clear and interpretable physical representation of energy costs, enabling more efficient and intuitive hyperparameter tuning for improved energy efficiency. ECO introduces dedicated constraints for energy consumption and reference motion, enforced by the Lagrangian method, to achieve stable, symmetric, and energy-efficient walking for humanoid robots. We evaluated ECO against MPC, standard RL with reward shaping, and four state-of-the-art constrained RL methods. Experiments, including sim-to-sim and sim-to-real transfers on the kid-sized humanoid robot BRUCE, demonstrate that ECO significantly reduces energy consumption compared to baselines while maintaining robust walking performance. These results highlight a substantial advancement in energy-efficient humanoid locomotion. All experimental demonstrations can be found on the project website: https://sites.google.com/view/eco-humanoid.
翻译:实现稳定且能量高效的步态对于仿人机器人在实际应用中持续运行至关重要。现有的模型预测控制(MPC)与强化学习(RL)方法通常将能量相关指标嵌入多目标优化框架中,这需要大量超参数调优且往往导致策略次优。为应对这些挑战,我们提出ECO(能量约束优化),一种约束强化学习框架,将能量相关指标从奖励函数中分离,并将其重新表述为显式不等式约束。该方法为能量成本提供了清晰且可解释的物理表征,从而能够通过更高效直观的超参数调优来提升能量效率。ECO通过拉格朗日方法施加针对能量消耗与参考运动的专用约束,以实现仿人机器人稳定、对称且能量高效的行走。我们将ECO与MPC、采用奖励塑形的标准强化学习方法以及四种先进的约束强化学习方法进行了对比评估。在儿童尺寸仿人机器人BRUCE上进行的仿真到仿真及仿真到实物迁移实验表明,ECO在保持鲁棒行走性能的同时,较基线方法显著降低了能量消耗。这些结果标志着能量高效仿人机器人步态控制的重要进展。所有实验演示均可在项目网站查看:https://sites.google.com/view/eco-humanoid。