A core strength of Model Predictive Control (MPC) for quadrupedal locomotion has been its ability to enforce constraints and provide interpretability of the sequence of commands over the horizon. However, despite being able to plan, MPC struggles to scale with task complexity, often failing to achieve robust behavior on rapidly changing surfaces. On the other hand, model-free Reinforcement Learning (RL) methods have outperformed MPC on multiple terrains, showing emergent motions but inherently lack any ability to handle constraints or perform planning. To address these limitations, we propose a framework that integrates proprioceptive planning with RL, allowing for agile and safe locomotion behaviors through the horizon. Inspired by MPC, we incorporate an internal model that includes a velocity estimator and a Dreamer module. During training, the framework learns an expert policy and an internal model that are co-dependent, facilitating exploration for improved locomotion behaviors. During deployment, the Dreamer module solves an infinite-horizon MPC problem, adapting actions and velocity commands to respect the constraints. We validate the robustness of our training framework through ablation studies on internal model components and demonstrate improved robustness to training noise. Finally, we evaluate our approach across multi-terrain scenarios in both simulation and hardware.
翻译:模型预测控制(MPC)在四足机器人运动中的一个核心优势在于其能够施加约束,并在预测时域内提供指令序列的可解释性。然而,尽管具备规划能力,MPC难以随任务复杂性扩展,在快速变化的地面上常无法实现鲁棒行为。另一方面,无模型强化学习(RL)方法在多种地形上已超越MPC,展现出涌现的运动能力,但其本质上缺乏处理约束或执行规划的能力。为应对这些局限,我们提出一个将本体感知规划与RL相集成的框架,通过预测时域实现敏捷且安全的运动行为。受MPC启发,我们引入包含速度估计器和Dreamer模块的内部模型。在训练阶段,该框架学习相互依赖的专家策略与内部模型,从而促进探索以改进运动行为。在部署阶段,Dreamer模块通过求解无限时域MPC问题,调整动作与速度指令以满足约束条件。我们通过对内部模型组件的消融实验验证了训练框架的鲁棒性,并证明其对训练噪声具有更强的鲁棒性。最后,我们在仿真与硬件平台上对多地形场景进行了方法评估。