Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing due to intuitive cost function tuning, accurate planning, and most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation and may hinder successful sim-to-real transfer. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or stepping stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach utilizes a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluate the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts. Finally, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning.
翻译:腿部运动是一个复杂的控制问题,既需要精度又需要鲁棒性以应对现实世界的挑战。传统上,腿部系统采用逆动力学结合轨迹优化的方式进行控制。此类分层基于模型的方法因成本函数调整直观、规划精确,最重要的是,基于十余年广泛研究获得的深刻理解而备受青睐。然而,模型失配与假设违背是常见的运行故障源,可能阻碍成功的仿真到现实迁移。另一方面,基于仿真的强化学习能产生具有空前鲁棒性和恢复能力的运动策略。但所有学习算法都难以应对有效立足点稀少的稀疏奖励场景(例如间隙或踏脚石)。本工作中,我们提出一种混合控制架构,融合两类方法的优势,同步实现更强的鲁棒性、足部落点精度与地形泛化能力。该方法利用基于模型的规划器在训练期间展开参考运动。深度神经网络策略在仿真中训练,旨在跟踪优化后的立足点。我们在纯数据驱动方法易失败的稀疏地形上评估运动管线的精度。此外,与基于模型的对应方法相比,我们在滑溜或可变形地面场景中展示了卓越的鲁棒性。最后,我们提出的跟踪控制器可泛化至训练中未见过的不同轨迹优化方法。总之,我们的工作将在线规划的预测能力与最优性保证,以及离线学习固有的鲁棒性统一于一体。