Adapting a Foundation Model for Space-based Tasks

Matthew Foutter,Praneet Bhoj,Rohan Sinha,Amine Elhafsi,Somrita Banerjee,Christopher Agia,Justin Kruger,Tommaso Guffanti,Daniele Gammelli,Simone D'Amico,Marco Pavone

Foundation models, e.g., large language models, possess attributes of intelligence which offer promise to endow a robot with the contextual understanding necessary to navigate complex, unstructured tasks in the wild. In the future of space robotics, we see three core challenges which motivate the use of a foundation model adapted to space-based applications: 1) Scalability of ground-in-the-loop operations; 2) Generalizing prior knowledge to novel environments; and 3) Multi-modality in tasks and sensor data. Therefore, as a first-step towards building a foundation model for space-based applications, we automatically label the AI4Mars dataset to curate a language annotated dataset of visual-question-answer tuples. We fine-tune a pretrained LLaVA checkpoint on this dataset to endow a vision-language model with the ability to perform spatial reasoning and navigation on Mars' surface. In this work, we demonstrate that 1) existing vision-language models are deficient visual reasoners in space-based applications, and 2) fine-tuning a vision-language model on extraterrestrial data significantly improves the quality of responses even with a limited training dataset of only a few thousand samples.

翻译：基础模型（例如大型语言模型）具备智能属性，有望赋予机器人对复杂、非结构化任务进行上下文理解的能力，从而在未知环境中实现自主导航。在空间机器人的未来发展中，我们识别出三大核心挑战，这些挑战促使我们采用适配空间应用的基础模型：1）地面在环操作的可扩展性；2）将先验知识泛化至新环境的能力；3）任务与传感器数据的多模态特性。因此，作为构建空间应用基础模型的第一步，我们自动标注AI4Mars数据集，构建了一个包含视觉-问题-答案三元组的语言标注数据集。我们基于该数据集对预训练的LLaVA检查点进行微调，赋予视觉-语言模型在火星表面进行空间推理与导航的能力。本研究表明：1）现有视觉-语言模型在空间应用中存在视觉推理能力不足的问题；2）即使仅使用数千样本的有限训练数据，基于地外数据对视觉-语言模型进行微调仍能显著提升其响应质量。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日