Object recognition and object pose estimation in robotic grasping continue to be significant challenges, since building a labelled dataset can be time consuming and financially costly in terms of data collection and annotation. In this work, we propose a synthetic data generation method that minimizes human intervention and makes downstream image segmentation algorithms more robust by combining a generated synthetic dataset with a smaller real-world dataset (hybrid dataset). Annotation experiments show that the proposed synthetic scene generation can diminish labelling time dramatically. RGB image segmentation is trained with hybrid dataset and combined with depth information to produce pixel-to-point correspondence of individual segmented objects. The object to grasp is then determined by the confidence score of the segmentation algorithm. Pick-and-place experiments demonstrate that segmentation trained on our hybrid dataset (98.9%, 70%) outperforms the real dataset and a publicly available dataset by (6.7%, 18.8%) and (2.8%, 10%) in terms of labelling and grasping success rate, respectively. Supplementary material is available at https://sites.google.com/view/synthetic-dataset-generation.
翻译:机器人抓取中的物体识别与姿态估计仍是重大挑战,因为构建标注数据集在数据采集与标注过程中既耗时又耗费资金。本文提出一种合成数据生成方法,通过组合生成的合成数据集与较小规模真实数据集(混合数据集),最大限度地减少人工干预并增强下游图像分割算法的鲁棒性。标注实验表明,所提出的合成场景生成可显著缩短标注时间。采用混合数据集训练RGB图像分割模型,结合深度信息建立各分割物体的像素-点对应关系。随后依据分割算法的置信度分数确定待抓取物体。抓取实验表明,在标注效率与抓取成功率方面,基于混合数据集训练的分割模型(98.9%, 70%)相较真实数据集和公开数据集分别提升(6.7%, 18.8%)和(2.8%, 10%)。补充材料详见https://sites.google.com/view/synthetic-dataset-generation。