We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily interpretable by humans, but not by mobile robots. We aim to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating on crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We develop a visual prompting approach and leverage the Visual Language Model's (VLM) zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our method in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe an average improvement of 22.07% in satisfying traversability constraints and 30.53% in terms of human-like navigation in four different outdoor navigation scenarios.
翻译:我们提出了一种多模态轨迹生成与选择算法,用于以人为中心的真实世界无地图户外导航。此类环境包含丰富的特征,如人行横道、草地和路缘,这些特征易于人类理解,但对移动机器人则不然。我们的目标是计算合适的轨迹,使其(1)满足特定环境的可通行性约束,并(2)在穿越人行横道、人行道等区域时生成类人路径。我们的方案采用一个由可通行性约束增强的条件变分自编码器(CVAE)生成模型,为全局导航生成多条候选轨迹。我们开发了一种视觉提示方法,并利用视觉语言模型(VLM)在语义理解和逻辑推理方面的零样本能力,根据任务上下文信息选择最佳轨迹。我们在多种户外场景中使用轮式机器人评估了我们的方法,并与其他全局导航算法的性能进行了比较。在实际测试中,我们观察到在四种不同的户外导航场景中,满足可通行性约束的平均性能提升了22.07%,在类人导航方面平均提升了30.53%。