This work explores the potential of using differentiable simulation for learning quadruped locomotion. Differentiable simulation promises fast convergence and stable training by computing low-variance first-order gradients using robot dynamics. However, its usage for legged robots is still limited to simulation. The main challenge lies in the complex optimization landscape of robotic tasks due to discontinuous dynamics. This work proposes a new differentiable simulation framework to overcome these challenges. Our approach combines a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. This approach maintains simulation accuracy by aligning the robot states from the surrogate model with those of the precise, non-differentiable simulator. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. We demonstrate that differentiable simulation outperforms a reinforcement learning algorithm (PPO) by achieving significantly better sample efficiency while maintaining its effectiveness in handling large-scale environments. Our method represents one of the first successful applications of differentiable simulation to real-world quadruped locomotion, offering a compelling alternative to traditional RL methods.
翻译:本研究探讨了利用可微分仿真学习四足机器人步态的潜力。通过基于机器人动力学计算低方差一阶梯度,可微分仿真有望实现快速收敛与稳定训练。然而,该方法在足式机器人领域的应用目前仍局限于仿真环境。主要挑战源于机器人任务中因动力学不连续性导致的复杂优化空间。为此,本研究提出了一种新型可微分仿真框架以应对这些挑战。我们的方法将用于前向动力学的高保真非可微分仿真器与用于梯度反向传播的简化代理模型相结合。通过使代理模型的机器人状态与高精度非可微分仿真器的状态保持一致,该方法保持了仿真精度。该框架无需并行化即可在数分钟内实现四足行走的仿真学习。当结合GPU并行化时,我们的方法能使四足机器人在数分钟内掌握复杂地形上的多样化步态技能。实验证明,可微分仿真在保持大规模环境处理能力的同时,其样本效率显著优于强化学习算法(PPO),实现了更优的样本利用率。本方法代表了可微分仿真在现实世界四足机器人步态控制中的首批成功应用,为传统强化学习方法提供了具有竞争力的替代方案。