In the context of human-robot interaction and collaboration scenarios, robotic grasping still encounters numerous challenges. Traditional grasp detection methods generally analyze the entire scene to predict grasps, leading to redundancy and inefficiency. In this work, we reconsider 6-DoF grasp detection from a target-referenced perspective and propose a Target-Oriented Grasp Network (TOGNet). TOGNet specifically targets local, object-agnostic region patches to predict grasps more efficiently. It integrates seamlessly with multimodal human guidance, including language instructions, pointing gestures, and interactive clicks. Thus our system comprises two primary functional modules: a guidance module that identifies the target object in 3D space and TOGNet, which detects region-focal 6-DoF grasps around the target, facilitating subsequent motion planning. Through 50 target-grasping simulation experiments in cluttered scenes, our system achieves a success rate improvement of about 13.7%. In real-world experiments, we demonstrate that our method excels in various target-oriented grasping scenarios.
翻译:在人机交互与协作场景中,机器人抓取仍面临诸多挑战。传统的抓取检测方法通常通过分析整个场景来预测抓取位姿,导致冗余与低效。本研究从目标参照的视角重新审视六自由度抓取检测问题,并提出一种目标导向抓取网络。该网络专门针对局部、与物体无关的区域块进行预测,以实现更高效的抓取检测。该系统可无缝整合多模态人机引导信息,包括语言指令、指向手势与交互点击。因此,我们的系统包含两个核心功能模块:在三维空间中识别目标物体的引导模块,以及围绕目标检测区域聚焦式六自由度抓取的网络模块,二者协同为后续运动规划提供支持。通过在杂乱场景中进行的50次目标抓取仿真实验,本系统将成功率提升了约13.7%。真实环境实验进一步表明,我们的方法在多种目标导向抓取场景中均表现出优越性能。