Online Multi-Contact Receding Horizon Planning via Value Function Approximation

Jiayi Wang,Sanghyun Kim,Teguh Santoso Lembono,Wenqian Du,Jaehyun Shim,Saeid Samadi,Ke Wang,Vladimir Ivan,Sylvain Calinon,Sethu Vijayakumar,Steve Tonneau

Planning multi-contact motions in a receding horizon fashion requires a value function to guide the planning with respect to the future, e.g., building momentum to traverse large obstacles. Traditionally, the value function is approximated by computing trajectories in a prediction horizon (never executed) that foresees the future beyond the execution horizon. However, given the non-convex dynamics of multi-contact motions, this approach is computationally expensive. To enable online Receding Horizon Planning (RHP) of multi-contact motions, we find efficient approximations of the value function. Specifically, we propose a trajectory-based and a learning-based approach. In the former, namely RHP with Multiple Levels of Model Fidelity, we approximate the value function by computing the prediction horizon with a convex relaxed model. In the latter, namely Locally-Guided RHP, we learn an oracle to predict local objectives for locomotion tasks, and we use these local objectives to construct local value functions for guiding a short-horizon RHP. We evaluate both approaches in simulation by planning centroidal trajectories of a humanoid robot walking on moderate slopes, and on large slopes where the robot cannot maintain static balance. Our results show that locally-guided RHP achieves the best computation efficiency (95\%-98.6\% cycles converge online). This computation advantage enables us to demonstrate online receding horizon planning of our real-world humanoid robot Talos walking in dynamic environments that change on-the-fly.

翻译：递推水平规划多接触运动需要利用值函数来指导规划以应对未来情景，例如积累动量以跨越大型障碍物。传统上，值函数通过计算预测区间（不执行）内的轨迹来近似，该轨迹可预见到执行区间之外的未来。然而，考虑到多接触运动的非凸动力学特性，这种方法计算成本高昂。为实现多接触运动的在线递推水平规划（RHP），我们寻找值函数的高效近似方法。具体而言，我们提出基于轨迹和基于学习的两种方法。前者称为多保真度模型递推水平规划（RHP with Multiple Levels of Model Fidelity），通过使用凸松弛模型计算预测区间来近似值函数。后者称为局部引导递推水平规划（Locally-Guided RHP），我们学习一个预测器（oracle）来预测运动任务的局部目标，并利用这些局部目标构建局部值函数，以引导短区间递推水平规划。我们通过模拟评估了两种方法：规划人形机器人在中等坡度以及大坡度（机器人无法保持静态平衡）行走时的质心轨迹。结果表明，局部引导递推水平规划实现了最佳计算效率（95%-98.6%的周期在线收敛）。这一计算优势使我们能够展示真实世界人形机器人Talos在动态变化环境中在线递推水平规划行走的可行性。