6D pose recognition has been a crucial factor in the success of robotic grasping, and recent deep learning based approaches have achieved remarkable results on benchmarks. However, their generalization capabilities in real-world applications remain unclear. To overcome this gap, we introduce 6IMPOSE, a novel framework for sim-to-real data generation and 6D pose estimation. 6IMPOSE consists of four modules: First, a data generation pipeline that employs the 3D software suite Blender to create synthetic RGBD image datasets with 6D pose annotations. Second, an annotated RGBD dataset of five household objects generated using the proposed pipeline. Third, a real-time two-stage 6D pose estimation approach that integrates the object detector YOLO-V4 and a streamlined, real-time version of the 6D pose estimation algorithm PVN3D optimized for time-sensitive robotics applications. Fourth, a codebase designed to facilitate the integration of the vision system into a robotic grasping experiment. Our approach demonstrates the efficient generation of large amounts of photo-realistic RGBD images and the successful transfer of the trained inference model to robotic grasping experiments, achieving an overall success rate of 87% in grasping five different household objects from cluttered backgrounds under varying lighting conditions. This is made possible by the fine-tuning of data generation and domain randomization techniques, and the optimization of the inference pipeline, overcoming the generalization and performance shortcomings of the original PVN3D algorithm. Finally, we make the code, synthetic dataset, and all the pretrained models available on Github.
翻译:6D姿态识别一直是机器人抓取成功的关键因素,近年来基于深度学习方法在基准测试中取得了显著成果。然而,这些方法在真实应用中的泛化能力仍不明确。为克服这一差距,我们提出了6IMPOSE——一个新颖的仿真到真实数据生成与6D姿态估计框架。6IMPOSE包含四个模块:第一,采用3D软件套件Blender生成带6D姿态标注的合成RGBD图像数据集的数据生成管道;第二,利用该管道生成的五类家居物品RGBD标注数据集;第三,集成目标检测器YOLO-V4与经优化适用于实时机器人应用的简化版PVN3D算法的两阶段6D姿态估计方法;第四,旨在促进视觉系统与机器人抓取实验集成的代码库。我们的方法证明了可高效生成大量逼真RGBD图像,并将训练推理模型成功迁移至机器人抓取实验,在不同光照条件下杂乱背景中抓取五类家居物品的总成功率达87%。这得益于数据生成与域随机化技术的微调,以及推理管道的优化,克服了原始PVN3D算法在泛化与性能方面的不足。最后,我们在Github上公开了代码、合成数据集及所有预训练模型。