We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.
翻译:本文提出了一种用于学习四足运动的最小相位振荡器模型。四个振荡器各自仅通过地面反作用力的局部反馈(可解释为观测器反馈增益)与自身及其对应腿部耦合。我们将振荡器本身视为潜在接触状态估计器。通过系统性消融研究,我们证明:相位观测、基于相位的简单奖励以及局部反馈动力学的组合,能够在使用简化奖励集且无需预设特定步态的情况下,诱导出展现突现步态偏好的策略。代码已开源,视频摘要见 https://youtu.be/1NKQ0rSV3jU。