While traditional methods relies on depth sensors, the current trend leans towards utilizing cost-effective RGB images, despite their absence of depth cues. This paper introduces an interesting approach to detect grasping pose from a single RGB image. To this end, we propose a modular learning network augmented with grasp detection and semantic segmentation, tailored for robots equipped with parallel-plate grippers. Our network not only identifies graspable objects but also fuses prior grasp analyses with semantic segmentation, thereby boosting grasp detection precision. Significantly, our design exhibits resilience, adeptly handling blurred and noisy visuals. Key contributions encompass a trainable network for grasp detection from RGB images, a modular design facilitating feasible grasp implementation, and an architecture robust against common image distortions. We demonstrate the feasibility and accuracy of our proposed approach through practical experiments and evaluations.
翻译:尽管传统方法依赖深度传感器,当前趋势倾向于利用成本更低的RGB图像(尽管缺乏深度信息)。本文提出了一种从单张RGB图像检测抓取姿态的有趣方法。为此,我们构建了一个融合抓取检测与语义分割功能的模块化学习网络,专为配备平行板夹持器的机器人设计。该网络不仅能识别可抓取物体,还能将先验抓取分析与语义分割相结合,从而提升抓取检测精度。值得关注的是,我们的设计具备鲁棒性,能有效处理模糊和带噪的视觉输入。核心贡献包括:一个可从RGB图像进行抓取检测的可训练网络、一个促进可行抓取实现的模块化设计,以及一个对常见图像失真具有鲁棒性的架构。通过实际实验与评估,我们验证了所提方法的可行性与准确性。