QUART-Online：用于四足机器人学习的无延迟大型多模态语言模型 (QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning)

This paper addresses the inherent inference latency challenges associated with deploying multimodal large language models (MLLM) in quadruped vision-language-action (QUAR-VLA) tasks. Our investigation reveals that conventional parameter reduction techniques ultimately impair the performance of the language foundation model during the action instruction tuning phase, making them unsuitable for this purpose. We introduce a novel latency-free quadruped MLLM model, dubbed QUART-Online, designed to enhance inference efficiency without degrading the performance of the language foundation model. By incorporating Action Chunk Discretization (ACD), we compress the original action representation space, mapping continuous action values onto a smaller set of discrete representative vectors while preserving critical information. Subsequently, we fine-tune the MLLM to integrate vision, language, and compressed actions into a unified semantic space. Experimental results demonstrate that QUART-Online operates in tandem with the existing MLLM system, achieving real-time inference in sync with the underlying controller frequency, significantly boosting the success rate across various tasks by 65%. Our project page is https://quart-online.github.io.

翻译：本文旨在解决在四足视觉-语言-动作（QUAR-VLA）任务中部署多模态大语言模型（MLLM）所固有的推理延迟挑战。我们的研究发现，传统的参数缩减技术最终会损害语言基础模型在动作指令微调阶段的性能，使其不适用于此目的。我们提出了一种新颖的无延迟四足MLLM模型，命名为QUART-Online，旨在提升推理效率，同时不降低语言基础模型的性能。通过引入动作分块离散化（ACD），我们压缩了原始的动作表示空间，将连续的动作值映射到一个更小的离散代表向量集合上，同时保留了关键信息。随后，我们对MLLM进行微调，将视觉、语言和压缩后的动作整合到一个统一的语义空间中。实验结果表明，QUART-Online与现有的MLLM系统协同工作，实现了与底层控制器频率同步的实时推理，将各种任务的成功率显著提升了65%。我们的项目页面是 https://quart-online.github.io。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日