We present an autonomous navigation system that operates without assuming HD LiDAR maps of the environment. Our system, ALT-Pilot, relies only on publicly available road network information and a sparse (and noisy) set of crowdsourced language landmarks. With the help of onboard sensors and a language-augmented topometric map, ALT-Pilot autonomously pilots the vehicle to any destination on the road network. We achieve this by leveraging vision-language models pre-trained on web-scale data to identify potential landmarks in a scene, incorporating vision-language features into the recursive Bayesian state estimation stack to generate global (route) plans, and a reactive trajectory planner and controller operating in the vehicle frame. We implement and evaluate ALT-Pilot in simulation and on a real, full-scale autonomous vehicle and report improvements over state-of-the-art topometric navigation systems by a factor of 3x on localization accuracy and 5x on goal reachability
翻译:我们提出了一种无需依赖环境高清LiDAR地图即可运行的自主导航系统。该系统ALT-Pilot仅依赖公开的道路网络信息及一组稀疏(且含噪声)的众包语言地标。通过搭载车载传感器和语言增强的拓扑度量地图,ALT-Pilot能够自主引导车辆行驶至道路网络中的任意目的地。我们采用以下方法实现该目标:利用基于网络规模数据预训练的视觉-语言模型识别场景中的潜在地标;将视觉-语言特征整合到递归贝叶斯状态估计框架中,以生成全局(路径)规划;并在车辆坐标系中运行反应式轨迹规划与控制器。我们在仿真环境和真实全尺寸自动驾驶车辆上对ALT-Pilot进行了实现与评估,结果表明:与现有最先进的拓扑度量导航系统相比,其定位精度提升3倍,目标可达性提升5倍。