Legged robots have the potential to expand the reach of autonomy beyond paved roads. In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. Due to the partial observability of the problem, the robot has to rely on past observations to infer the terrain currently beneath it. To solve this problem, we follow the paradigm in computer vision that explicitly models the 3D geometry of the scene and propose Neural Volumetric Memory (NVM), a geometric memory architecture that explicitly accounts for the SE(3) equivariance of the 3D world. NVM aggregates feature volumes from multiple camera views by first bringing them back to the ego-centric frame of the robot. We test the learned visual-locomotion policy on a physical robot and show that our approach, which explicitly introduces geometric priors during training, offers superior performance than more na\"ive methods. We also include ablation studies and show that the representations stored in the neural volumetric memory capture sufficient geometric information to reconstruct the scene. Our project page with videos is https://rchalyang.github.io/NVM .
翻译:腿式机器人有望将自主能力扩展到铺装道路之外。在本工作中,我们研究了利用单个前向深度相机在复杂地形上实现运动控制的困难问题。由于问题的部分可观测性,机器人必须依赖历史观测来推断当前脚下的地形。为解决此问题,我们借鉴计算机视觉中显式建模场景三维几何的范式,提出了神经体积记忆(NVM)——一种显式考虑三维世界SE(3)等变性的几何记忆架构。NVM通过将多个相机视角的特征体先变换回机器人的自我中心坐标系进行聚合。我们在实体机器人上测试了所学习的视觉运动策略,结果表明,相较于更朴素的方法,我们的方法在训练过程中显式引入几何先验,从而展现出更优的性能。我们还通过消融实验验证了存储在神经体积记忆中的表征捕获了足够的几何信息以重建场景。我们的项目页面及视频见https://rchalyang.github.io/NVM。