QwenGrasp: A Usage of Large Vision-Language Model for Target-Oriented Grasping

Target-oriented grasping in unstructured scenes with language control is essential for intelligent robot arm grasping. The ability for the robot arm to understand the human language and execute corresponding grasping actions is a pivotal challenge. In this paper, we propose a combination model called QwenGrasp which combines a large vision-language model with a 6-DoF grasp neural network. QwenGrasp is able to conduct a 6-DoF grasping task on the target object with textual language instruction. We design a complete experiment with six-dimension instructions to test the QwenGrasp when facing with different cases. The results show that QwenGrasp has a superior ability to comprehend the human intention. Even in the face of vague instructions with descriptive words or instructions with direction information, the target object can be grasped accurately. When QwenGrasp accepts the instruction which is not feasible or not relevant to the grasping task, our approach has the ability to suspend the task execution and provide a proper feedback to humans, improving the safety. In conclusion, with the great power of large vision-language model, QwenGrasp can be applied in the open language environment to conduct the target-oriented grasping task with freely input instructions.

翻译：在非结构化场景中通过语言控制实现目标导向抓取，对于智能机械臂抓取至关重要。使机械臂能够理解人类语言并执行相应抓取动作是一项关键挑战。本文提出一种名为QwenGrasp的组合模型，该模型融合了大规模视觉语言模型与6自由度抓取神经网络。QwenGrasp能够根据文本语言指令对目标物体执行6自由度抓取任务。我们设计了包含六维指令的完整实验，以测试QwenGrasp在不同情况下的表现。结果表明，QwenGrasp具有卓越的人类意图理解能力。即使面对包含描述性词语的模糊指令或带有方向信息的指令，也能精准抓取目标物体。当QwenGrasp接收到不可执行或与抓取任务无关的指令时，该方法能够暂停任务执行并向人类提供适当反馈，从而提升安全性。总之，借助大规模视觉语言模型的强大能力，QwenGrasp可在开放语言环境中通过自由输入指令完成目标导向抓取任务。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日