Model-free reinforcement learning (RL) has enabled adaptable and agile quadruped locomotion; however, policies often converge to a single gait, leading to suboptimal performance. Traditionally, Model Predictive Control (MPC) has been extensively used to obtain task-specific optimal policies but lacks the ability to adapt to varying environments. To address these limitations, we propose an optimization framework for real-time gait adaptation in a continuous gait space, combining the Model Predictive Path Integral (MPPI) algorithm with a Dreamer module to produce adaptive and optimal policies for quadruped locomotion. At each time step, MPPI jointly optimizes the actions and gait variables using a learned Dreamer reward that promotes velocity tracking, energy efficiency, stability, and smooth transitions, while penalizing abrupt gait changes. A learned value function is incorporated as terminal reward, extending the formulation to an infinite-horizon planner. We evaluate our framework in simulation on the Unitree Go1, demonstrating an average reduction of up to 36.48 % in energy consumption across varying target speeds, while maintaining accurate tracking and adaptive, task-appropriate gaits.
翻译:无模型强化学习(RL)已实现适应性强且敏捷的四足机器人运动;然而,策略常收敛于单一步态,导致性能欠佳。传统上,模型预测控制(MPC)被广泛用于获取任务特定的最优策略,但缺乏适应多变环境的能力。为应对这些局限,我们提出一种在连续步态空间中实现实时步态自适应的优化框架,该框架结合模型预测路径积分(MPPI)算法与Dreamer模块,为四足机器人运动生成自适应且最优的策略。在每个时间步,MPPI利用学习的Dreamer奖励函数联合优化动作与步态变量,该奖励函数促进速度跟踪、能量效率、稳定性与平滑过渡,同时惩罚突变的步态切换。通过引入学习的价值函数作为终端奖励,我们将该框架扩展为无限时域规划器。我们在Unitree Go1平台上进行仿真评估,结果表明:在变化的目标速度下,系统平均能耗降低高达36.48%,同时保持精确的轨迹跟踪能力及适应任务需求的步态。