Single-view RGB-D grasp detection remains a common choice in 6-DoF robotic grasping systems, which typically requires a depth sensor. While RGB-only 6-DoF grasp methods has been studied recently, their inaccurate geometric representation is not directly suitable for physically reliable robotic manipulation, thereby hindering reliable grasp generation. To address these limitations, we propose MG-Grasp, a novel depth-free 6-DoF grasping framework that achieves high-quality object grasping. Leveraging two-view 3D foundation model with camera intrinsic/extrinsic, our method reconstructs metric-scale and multi-view consistent dense point clouds from sparse RGB images and generates stable 6-DoF grasp. Experiments on GraspNet-1Billion dataset and real world demonstrate that MG-Grasp achieves state-of-the-art (SOTA) grasp performance among RGB-based 6-DoF grasping methods.
翻译:单视角RGB-D抓取检测仍是六自由度机器人抓取系统中的常见选择,其通常需要深度传感器。尽管近期已研究了仅使用RGB的六自由度抓取方法,但其不精确的几何表示并不直接适用于物理上可靠的机器人操作,从而阻碍了可靠抓取的生成。为应对这些局限,我们提出了MG-Grasp,一种新颖的无深度六自由度抓取框架,能够实现高质量物体抓取。通过利用具有相机内参/外参的双视角三维基础模型,我们的方法从稀疏RGB图像中重建度量尺度且多视角一致的稠密点云,并生成稳定的六自由度抓取。在GraspNet-1Billion数据集及真实场景上的实验表明,MG-Grasp在基于RGB的六自由度抓取方法中达到了最先进的抓取性能。