Single-view RGB-D grasp detection remains a com- mon choice in 6-DoF robotic grasping systems, which typically requires a depth sensor. While RGB-only 6-DoF grasp methods has been studied recently, their inaccurate geometric repre- sentation is not directly suitable for physically reliable robotic manipulation, thereby hindering reliable grasp generation. To address these limitations, we propose MG-Grasp, a novel depth- free 6-DoF grasping framework that achieves high-quality object grasping. Leveraging two-view 3D foundation model with camera intrinsic/extrinsic, our method reconstructs metric- scale and multi-view consistent dense point clouds from sparse RGB images and generates stable 6-DoF grasp. Experiments on GraspNet-1Billion dataset and real world demonstrate that MG-Grasp achieves state-of-the-art (SOTA) grasp performance among RGB-based 6-DoF grasping methods.
翻译:单视角RGB-D抓取检测仍然是六自由度机器人抓取系统中的常见选择,其通常需要深度传感器。尽管近期已出现仅使用RGB的六自由度抓取方法,但其不精确的几何表征并不直接适用于物理上可靠的机器人操作,从而阻碍了可靠抓取姿态的生成。为克服这些局限,我们提出MG-Grasp——一种新型的无深度六自由度抓取框架,能够实现高质量物体抓取。该方法利用具有相机内参/外参的双视角三维基础模型,从稀疏RGB图像重建度量尺度且多视角一致的高密度点云,并生成稳定的六自由度抓取姿态。在GraspNet-1Billion数据集及真实场景上的实验表明,MG-Grasp在基于RGB的六自由度抓取方法中达到了最先进的抓取性能。