Designing a humanoid locomotion controller is challenging and classically split up in sub-problems. Footstep planning is one of those, where the sequence of footsteps is defined. Even in simpler environments, finding a minimal sequence, or even a feasible sequence, yields a complex optimization problem. In the literature, this problem is usually addressed by search-based algorithms (e.g. variants of A*). However, such approaches are either computationally expensive or rely on hand-crafted tuning of several parameters. In this work, at first, we propose an efficient footstep planning method to navigate in local environments with obstacles, based on state-of-the art Deep Reinforcement Learning (DRL) techniques, with very low computational requirements for on-line inference. Our approach is heuristic-free and relies on a continuous set of actions to generate feasible footsteps. In contrast, other methods necessitate the selection of a relevant discrete set of actions. Second, we propose a forecasting method, allowing to quickly estimate the number of footsteps required to reach different candidates of local targets. This approach relies on inherent computations made by the actor-critic DRL architecture. We demonstrate the validity of our approach with simulation results, and by a deployment on a kid-size humanoid robot during the RoboCup 2023 competition.
翻译:设计人形机器人步态控制器具有挑战性,传统上常将其分解为若干子问题。步态规划是其中之一,其任务在于确定步态序列。即使在较简单的环境中,寻找最小步数序列或可行序列也构成了复杂的优化问题。现有文献通常通过基于搜索的算法(例如A*的变体)来解决此问题。然而,这类方法要么计算成本高昂,要么依赖于对多个参数进行人工调优。本研究首先提出一种基于前沿深度强化学习技术的高效步态规划方法,用于在含障碍物的局部环境中导航,其在线推理计算需求极低。该方法无需启发式策略,通过连续动作空间生成可行步态,而其他方法通常需要选择特定的离散动作集合。其次,我们提出一种预测方法,能够快速估计到达不同局部目标候选位置所需的步数。该预测方法利用了行动者-评判者深度强化学习架构的内在计算机制。我们通过仿真结果以及在RoboCup 2023竞赛中部署于儿童尺寸人形机器人的实例,验证了所提方法的有效性。