Online Multi-Contact Receding Horizon Planning via Value Function Approximation

Jiayi Wang,Sanghyun Kim,Teguh Santoso Lembono,Wenqian Du,Jaehyun Shim,Saeid Samadi,Ke Wang,Vladimir Ivan,Sylvain Calinon,Sethu Vijayakumar,Steve Tonneau

from arxiv, Under review

Planning multi-contact motions in a receding horizon fashion requires a value function to guide the planning with respect to the future, e.g., building momentum to traverse large obstacles. Traditionally, the value function is approximated by computing trajectories in a prediction horizon (never executed) that foresees the future beyond the execution horizon. However, given the non-convex dynamics of multi-contact motions, this approach is computationally expensive. To enable online Receding Horizon Planning (RHP) of multi-contact motions, we find efficient approximations of the value function. Specifically, we propose a trajectory-based and a learning-based approach. In the former, namely RHP with Multiple Levels of Model Fidelity, we approximate the value function by computing the prediction horizon with a convex relaxed model. In the latter, namely Locally-Guided RHP, we learn an oracle to predict local objectives for locomotion tasks, and we use these local objectives to construct local value functions for guiding a short-horizon RHP. We evaluate both approaches in simulation by planning centroidal trajectories of a humanoid robot walking on moderate slopes, and on large slopes where the robot cannot maintain static balance. Our results show that locally-guided RHP achieves the best computation efficiency (95\%-98.6\% cycles converge online). This computation advantage enables us to demonstrate online receding horizon planning of our real-world humanoid robot Talos walking in dynamic environments that change on-the-fly.

翻译：以缩进视界方式规划多接触运动需要值函数来引导面向未来的规划，例如构建动量以跨越大型障碍物。传统上，值函数通过计算预测视界内的轨迹（这些轨迹从未被执行）来近似，该视界能够预见执行视界之外的未来。然而，鉴于多接触运动的非凸动力学特性，该方法计算成本高昂。为实现多接触运动的在线缩进视界规划（RHP），我们找到了值函数的高效近似方法。具体而言，我们提出了一种基于轨迹的方法和一种基于学习的方法。前者称为多保真度模型缩进视界规划，通过使用凸松弛模型计算预测视界来近似值函数；后者称为局部引导缩进视界规划，通过学习一个预言机来预测运动任务的局部目标，并利用这些局部目标构建局部值函数，以引导短视界缩进视界规划。我们在仿真中通过规划人形机器人在中等坡度上行走以及在大坡度上（机器人无法保持静态平衡）行走的质心轨迹，对两种方法进行了评估。结果表明，局部引导缩进视界规划在计算效率上最优（95%-98.6%的循环在线收敛）。这一计算优势使我们能够展示在动态变化环境中实时行走的真实世界人形机器人塔洛斯（Talos）的在线缩进视界规划。