We present a footstep planning policy for quadrupedal locomotion that is able to directly take into consideration a-priori safety information in its decisions. At its core, a learning process analyzes terrain patches, classifying each landing location by its kinematic feasibility, shin collision, and terrain roughness. This information is then encoded into a small vector representation and passed as an additional state to the footstep planning policy, which furthermore proposes only safe footstep location by applying a masked variant of the Proximal Policy Optimization algorithm. The performance of the proposed approach is shown by comparative simulations and experiments on an electric quadruped robot walking in different rough terrain scenarios. We show that violations of the above safety conditions are greatly reduced both during training and the successive deployment of the policy, resulting in an inherently safer footstep planner. Furthermore, we show how, as a byproduct, fewer reward terms are needed to shape the behavior of the policy, which in return is able to achieve both better final performances and sample efficiency.
翻译:本文提出了一种四足机器人步态规划策略,该策略能够在其决策过程中直接考虑先验安全信息。其核心在于,学习过程通过分析地形区域,对每个落脚位置的运动学可行性、小腿碰撞风险和地形粗糙度进行分类评估。这些信息被编码为小型向量表示,并作为附加状态传递给步态规划策略;该策略通过应用近端策略优化算法的掩码变体,仅推荐安全的落脚位置。通过在电动四足机器人行走于不同粗糙地形场景中的仿真与实验对比,验证了所提方法的性能。结果表明,在训练阶段及策略的后续部署中,上述安全条件的违反情况均显著减少,从而形成了本质更安全的步态规划器。此外,我们还发现,作为副产品,塑造策略行为所需的奖励项更少,这反而使策略能够实现更优的最终性能和样本效率。