We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.
翻译:我们提出了一种用于学习四足运动的最小相位振荡器模型。四个振荡器各自仅通过地面反作用力的局部反馈(可解释为观测器反馈增益)与自身及其对应腿部耦合。我们将振荡器本身解释为潜在接触状态估计器。通过系统的消融研究,我们表明:相位观测、基于相位的简单奖励以及局部反馈动力学的组合,能够在无需预设特定步态的情况下,仅使用简化的奖励集合诱导出具有涌现步态偏好的策略。代码已开源,视频摘要见 https://youtu.be/1NKQ0rSV3jU。