NaVILA：用于导航的足式机器人视觉-语言-动作模型 (NaVILA: Legged Robot Vision-Language-Action Model for Navigation)

This paper proposes to solve the problem of Vision-and-Language Navigation with legged robots, which not only provides a flexible way for humans to command but also allows the robot to navigate through more challenging and cluttered scenes. However, it is non-trivial to translate human language instructions all the way to low-level leg joint actions. We propose NaVILA, a 2-level framework that unifies a Vision-Language-Action model (VLA) with locomotion skills. Instead of directly predicting low-level actions from VLA, NaVILA first generates mid-level actions with spatial information in the form of language, (e.g., "moving forward 75cm"), which serves as an input for a visual locomotion RL policy for execution. NaVILA substantially improves previous approaches on existing benchmarks. The same advantages are demonstrated in our newly developed benchmarks with IsaacLab, featuring more realistic scenes, low-level controls, and real-world robot experiments. We show more results at https://navila-bot.github.io/

翻译：本文旨在解决足式机器人的视觉与语言导航问题，该方法不仅为人类提供了一种灵活的指令方式，还使机器人能够在更具挑战性和杂乱的环境中导航。然而，将人类语言指令直接转化为底层的腿部关节动作并非易事。我们提出了NaVILA，这是一个双层框架，它将视觉-语言-动作模型与运动技能相统一。NaVILA并非直接从VLA预测底层动作，而是首先生成具有空间信息的中层动作，并以语言形式表达（例如，“向前移动75厘米”），该指令随后作为视觉运动强化学习策略的输入以执行动作。NaVILA在现有基准测试中显著改进了先前的方法。我们在新开发的IsaacLab基准测试中也展示了相同的优势，该测试具有更逼真的场景、底层控制以及真实机器人实验。更多结果请访问 https://navila-bot.github.io/

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日