Learning-based quadrupedal locomotion typically relies on complex reward formulations that entangle task specification, operational limits, gait preference, and terrain adaptation within a single optimization objective. We instead treat these functions through distinct mechanisms: rewards for task specification, constraints for operational limits, energy minimization for gait preference, and exteroceptive perception for adapting energy use to terrain difficulty. We show that these components jointly enable efficient, terrain-adaptive locomotion, and that removing each component exposes a distinct failure mode. Our formulation removes explicit gait priors (including air-time, contact-count, and foot-clearance targets) in favor of emergent behavior. Compared to a conventional complex-reward baseline, our formulation achieves comparable terrain traversal while reducing cost of transport by 56% and operational-limit violations by 96%. The resulting policies transfer zero-shot to a physical Unitree Go2 using LiDAR-based elevation mapping. Project website with videos: https://tinyurl.com/locomposition.
翻译:摘要:基于学习的四足运动通常依赖于复杂的奖励函数设计,将任务规范、运行限制、步态偏好和地形适应等目标纠缠在单一优化目标中。我们转而通过不同机制处理这些功能:用奖励函数处理任务规范,用约束处理运行限制,用能量最小化处理步态偏好,并用外部感知实现能量使用对地形难度的自适应。我们证明这些组件共同实现了高效、地形自适应的运动,且移除每个组件会暴露出不同的失效模式。我们的方案摒弃了显式的步态先验(包括腾空时间、触地次数和足端间隙目标),转而支持涌现行为。与传统的复杂奖励基线相比,我们的方案在实现相当地形穿越能力的同时,将运输成本降低56%,运行限制违反减少96%。所得策略可直接零样本迁移至基于激光雷达高程地图的实体Unitree Go2机器人。项目网站含视频:https://tinyurl.com/locomposition。