GEVRM：面向鲁棒视觉操控的目标表达性视频生成模型 (GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation)

With the rapid development of embodied artificial intelligence, significant progress has been made in vision-language-action (VLA) models for general robot decision-making. However, the majority of existing VLAs fail to account for the inevitable external perturbations encountered during deployment. These perturbations introduce unforeseen state information to the VLA, resulting in inaccurate actions and consequently, a significant decline in generalization performance. The classic internal model control (IMC) principle demonstrates that a closed-loop system with an internal model that includes external input signals can accurately track the reference input and effectively offset the disturbance. We propose a novel closed-loop VLA method GEVRM that integrates the IMC principle to enhance the robustness of robot visual manipulation. The text-guided video generation model in GEVRM can generate highly expressive future visual planning goals. Simultaneously, we evaluate perturbations by simulating responses, which are called internal embeddings and optimized through prototype contrastive learning. This allows the model to implicitly infer and distinguish perturbations from the external environment. The proposed GEVRM achieves state-of-the-art performance on both standard and perturbed CALVIN benchmarks and shows significant improvements in realistic robot tasks.

翻译：随着具身人工智能的快速发展，面向通用机器人决策的视觉-语言-动作（VLA）模型已取得显著进展。然而，现有的大多数VLA模型未能考虑部署过程中不可避免的外部扰动。这些扰动为VLA引入了不可预见的状态信息，导致其产生不准确的动作，进而造成泛化性能的显著下降。经典的内模控制（IMC）原理表明，一个包含外部输入信号内部模型的闭环系统能够精确跟踪参考输入并有效抵消扰动。我们提出了一种新颖的闭环VLA方法GEVRM，该方法集成了IMC原理以增强机器人视觉操控的鲁棒性。GEVRM中的文本引导视频生成模型能够生成高度表达性的未来视觉规划目标。同时，我们通过模拟响应来评估扰动，这些响应被称为内部嵌入，并通过原型对比学习进行优化。这使得模型能够隐式推断并区分来自外部环境的扰动。所提出的GEVRM在标准及扰动CALVIN基准测试上均取得了最先进的性能，并在现实机器人任务中展现出显著改进。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日