TGS：无地图户外环境中利用视觉语言模型的轨迹生成与选择算法 (TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments)

We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily interpretable by humans, but not by mobile robots. We aim to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating on crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We develop a visual prompting approach and leverage the Visual Language Model's (VLM) zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our method in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe an average improvement of 22.07% in satisfying traversability constraints and 30.53% in terms of human-like navigation in four different outdoor navigation scenarios.

翻译：我们提出了一种多模态轨迹生成与选择算法，用于以人为中心的真实世界无地图户外导航。此类环境包含丰富的特征，如人行横道、草地和路缘，这些特征易于人类理解，但对移动机器人则不然。我们的目标是计算合适的轨迹，使其（1）满足特定环境的可通行性约束，并（2）在穿越人行横道、人行道等区域时生成类人路径。我们的方案采用一个由可通行性约束增强的条件变分自编码器（CVAE）生成模型，为全局导航生成多条候选轨迹。我们开发了一种视觉提示方法，并利用视觉语言模型（VLM）在语义理解和逻辑推理方面的零样本能力，根据任务上下文信息选择最佳轨迹。我们在多种户外场景中使用轮式机器人评估了我们的方法，并与其他全局导航算法的性能进行了比较。在实际测试中，我们观察到在四种不同的户外导航场景中，满足可通行性约束的平均性能提升了22.07%，在类人导航方面平均提升了30.53%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日