GraspCorrect：基于视觉语言模型引导反馈的机器人抓取校正 (GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback)

Despite significant advancements in robotic manipulation, achieving consistent and stable grasping remains a fundamental challenge, often limiting the successful execution of complex tasks. Our analysis reveals that even state-of-the-art policy models frequently exhibit unstable grasping behaviors, leading to failure cases that create bottlenecks in real-world robotic applications. To address these challenges, we introduce GraspCorrect, a plug-and-play module designed to enhance grasp performance through vision-language model-guided feedback. GraspCorrect employs an iterative visual question-answering framework with two key components: grasp-guided prompting, which incorporates task-specific constraints, and object-aware sampling, which ensures the selection of physically feasible grasp candidates. By iteratively generating intermediate visual goals and translating them into joint-level actions, GraspCorrect significantly improves grasp stability and consistently enhances task success rates across existing policy models in the RLBench and CALVIN datasets.

翻译：尽管机器人操作领域取得了显著进展，但实现一致且稳定的抓取仍然是一个根本性挑战，常常限制了复杂任务的成功执行。我们的分析表明，即使是最先进的策略模型也频繁表现出不稳定的抓取行为，导致失败案例，这在现实世界的机器人应用中形成了瓶颈。为应对这些挑战，我们提出了GraspCorrect，一个即插即用模块，旨在通过视觉语言模型引导的反馈来提升抓取性能。GraspCorrect采用了一个迭代的视觉问答框架，包含两个关键组件：抓取引导提示，它整合了任务特定的约束；以及物体感知采样，它确保了物理上可行的抓取候选方案的选择。通过迭代生成中间视觉目标并将其转化为关节级动作，GraspCorrect显著提高了抓取稳定性，并在RLBench和CALVIN数据集上持续提升了现有策略模型的任务成功率。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/