We present a footstep planning policy for quadrupedal locomotion that is able to directly take into consideration a-priori safety information in its decisions. At its core, a learning process analyzes terrain patches, classifying each landing location by its kinematic feasibility, shin collision, and terrain roughness. This information is then encoded into a small vector representation and passed as an additional state to the footstep planning policy, which furthermore proposes only safe footstep location by applying a masked variant of the Proximal Policy Optimization (PPO) algorithm. The performance of the proposed approach is shown by comparative simulations on an electric quadruped robot walking in different rough terrain scenarios. We show that violations of the above safety conditions are greatly reduced both during training and the successive deployment of the policy, resulting in an inherently safer footstep planner. Furthermore, we show how, as a byproduct, fewer reward terms are needed to shape the behavior of the policy, which in return is able to achieve both better final performances and sample efficiency
翻译:我们提出了一种适用于四足机器人的落脚点规划策略,该策略能够直接基于先验安全信息做出决策。其核心在于,学习过程会分析地形片段,通过运动学可行性、小腿碰撞和地形粗糙度对每个落脚位置进行分类。这些信息随后被编码为一个小型向量表示,作为额外状态传递给落脚点规划策略,该策略通过应用近端策略优化算法的掩码变体,仅提出安全的落脚位置。通过在电动四足机器人在不同崎岖地形场景中的对比仿真,展示了所提方法的性能。结果表明,在策略训练及后续部署过程中,上述安全条件的违反情况大幅减少,从而实现了本质更安全的落脚点规划器。此外,我们还展示了作为副产物,塑造策略行为所需的奖励项更少,这反过来使得策略能够同时获得更好的最终性能和样本效率。