We present VAPOR, a novel method for autonomous legged robot navigation in unstructured, densely vegetated outdoor environments using offline Reinforcement Learning (RL). Our method trains a novel RL policy using an actor-critic network and arbitrary data collected in real outdoor vegetation. Our policy uses height and intensity-based cost maps derived from 3D LiDAR point clouds, a goal cost map, and processed proprioception data as state inputs, and learns the physical and geometric properties of the surrounding obstacles such as height, density, and solidity/stiffness. The fully-trained policy's critic network is then used to evaluate the quality of dynamically feasible velocities generated from a novel context-aware planner. Our planner adapts the robot's velocity space based on the presence of entrapment inducing vegetation, and narrow passages in dense environments. We demonstrate our method's capabilities on a Spot robot in complex real-world outdoor scenes, including dense vegetation. We observe that VAPOR's actions improve success rates by up to 40%, decrease the average current consumption by up to 2.9%, and decrease the normalized trajectory length by up to 11.2% compared to existing end-to-end offline RL and other outdoor navigation methods.
翻译:本文提出VAPOR,一种利用离线强化学习实现四足机器人在非结构化、高密度植被户外环境中自主导航的新方法。该方法采用演员-评论家网络架构,基于真实户外植被环境中采集的任意数据进行策略训练。策略以三维激光雷达点云生成的高度和强度代价地图、目标代价地图以及处理后的本体感知数据作为状态输入,学习周围障碍物的物理几何特性(如高度、密度、刚度/硬度)。训练完成的评论家网络可用于评估由新型上下文感知规划器生成的动态可行速度质量。该规划器根据陷阱性植被的存在及密集环境中的窄通道情况,动态调整机器人的速度空间。我们在包含高密度植被的复杂真实户外场景中,利用Spot机器人验证了方法的有效性。实验表明,与现有端到端离线强化学习及其他户外导航方法相比,VAPOR的行为将成功率提升高达40%,平均电流消耗降低2.9%,归一化轨迹长度缩短11.2%。