Vision-based grasp estimation is an essential part of robotic manipulation tasks in the real world. Existing planar grasp estimation algorithms have been demonstrated to work well in relatively simple scenes. But when it comes to complex scenes, such as cluttered scenes with messy backgrounds and moving objects, the algorithms from previous works are prone to generate inaccurate and unstable grasping contact points. In this work, we first study the existing planar grasp estimation algorithms and analyze the related challenges in complex scenes. Secondly, we design a Pixel-wise Efficient Grasp Generation Network (PEGG-Net) to tackle the problem of grasping in complex scenes. PEGG-Net can achieve improved state-of-the-art performance on the Cornell dataset (98.9%) and second-best performance on the Jacquard dataset (93.8%), outperforming other existing algorithms without the introduction of complex structures. Thirdly, PEGG-Net could operate in a closed-loop manner for added robustness in dynamic environments using position-based visual servoing (PBVS). Finally, we conduct real-world experiments on static, dynamic, and cluttered objects in different complex scenes. The results show that our proposed network achieves a high success rate in grasping irregular objects, household objects, and workshop tools. To benefit the community, our trained model and supplementary materials are available at https://github.com/HZWang96/PEGG-Net.
翻译:基于视觉的抓取估计是真实世界机器人操作任务中的重要组成部分。现有的平面抓取估计算法已在相对简单的场景中表现出良好性能。然而,当面对复杂场景(例如背景杂乱、物体移动的杂乱环境)时,先前工作中的算法容易生成不准确且不稳定的抓取接触点。在本工作中,我们首先研究了现有的平面抓取估计算法,并分析了复杂场景中的相关挑战。其次,我们设计了一种像素级高效抓取生成网络(PEGG-Net),以解决复杂场景中的抓取问题。PEGG-Net在Cornell数据集上实现了改进的最优性能(98.9%),并在Jacquard数据集上达到第二优性能(93.8%),在未引入复杂结构的情况下优于其他现有算法。第三,PEGG-Net能够以闭环方式运行,通过基于位置的视觉伺服(PBVS)在动态环境中增强鲁棒性。最后,我们在不同复杂场景中对静态、动态及杂乱物体进行了真实世界实验。结果表明,我们提出的网络在抓取不规则物体、家居用品和车间工具时实现了高成功率。为回馈社区,我们的训练模型及补充材料发布于https://github.com/HZWang96/PEGG-Net。