NBMOD: Find It and Grasp It in Noisy Background

Grasping objects is a fundamental yet important capability of robots, and many tasks such as sorting and picking rely on this skill. The prerequisite for stable grasping is the ability to correctly identify suitable grasping positions. However, finding appropriate grasping points is challenging due to the diverse shapes, varying density distributions, and significant differences between the barycenter of various objects. In the past few years, researchers have proposed many methods to address the above-mentioned issues and achieved very good results on publicly available datasets such as the Cornell dataset and the Jacquard dataset. The problem is that the backgrounds of Cornell and Jacquard datasets are relatively simple - typically just a whiteboard, while in real-world operational environments, the background could be complex and noisy. Moreover, in real-world scenarios, robots usually only need to grasp fixed types of objects. To address the aforementioned issues, we proposed a large-scale grasp detection dataset called NBMOD: Noisy Background Multi-Object Dataset for grasp detection, which consists of 31,500 RGB-D images of 20 different types of fruits. Accurate prediction of angles has always been a challenging problem in the detection task of oriented bounding boxes. This paper presents a Rotation Anchor Mechanism (RAM) to address this issue. Considering the high real-time requirement of robotic systems, we propose a series of lightweight architectures called RA-GraspNet (GraspNet with Rotation Anchor): RARA (network with Rotation Anchor and Region Attention), RAST (network with Rotation Anchor and Semi Transformer), and RAGT (network with Rotation Anchor and Global Transformer) to tackle this problem. Among them, the RAGT-3/3 model achieves an accuracy of 99% on the NBMOD dataset. The NBMOD and our code are available at https://github.com/kmittle/Grasp-Detection-NBMOD.

翻译：[translated abstract in Chinese] 抓取物体是机器人一项基础且重要的能力，许多任务（如分拣、拾取）均依赖于该技能。稳定抓取的前提是能够正确识别合适的抓取位置。然而，由于不同物体的形状多样、密度分布不均且重心存在显著差异，寻找合适的抓取点极具挑战性。近年来，研究人员已提出多种方法应对上述问题，并在Cornell数据集和Jacquard数据集等公开数据集上取得了优异成果。然而，Cornell和Jacquard数据集的背景较为简单（通常为白板），而实际操作环境中的背景可能复杂且嘈杂。此外，真实场景中机器人通常只需抓取固定类型的物体。针对上述问题，我们提出了大规模抓取检测数据集NBMOD（噪声背景多物体抓取检测数据集），包含20种不同水果的31,500张RGB-D图像。在定向边界框检测任务中，角度精确预测始终是一个难题。本文提出旋转锚点机制（RAM）以解决该问题。考虑到机器人系统对实时性的高要求，我们设计了名为RA-GraspNet（基于旋转锚点的抓取网络）的轻量级架构系列：RARA（旋转锚点与区域注意力网络）、RAST（旋转锚点与半 Transformer网络）及RAGT（旋转锚点与全局Transformer网络）。其中，RAGT-3/3模型在NBMOD数据集上的准确率达99%。NBMOD数据集及代码已开源：https://github.com/kmittle/Grasp-Detection-NBMOD。