To enable meaningful robotic manipulation of objects in the real-world, 6D pose estimation is one of the critical aspects. Most existing approaches have difficulties to extend predictions to scenarios where novel object instances are continuously introduced, especially with heavy occlusions. In this work, we propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a self-adaptive segmentation module to identify the novel target object and construct a point cloud model of the target object using only a small number of cluttered reference images. Unlike existing methods, SA6D does not require object-centric reference images or any additional object information, making it a more generalizable and scalable solution across categories. We evaluate SA6D on real-world tabletop object datasets and demonstrate that SA6D outperforms existing FSPE methods, particularly in cluttered scenes with occlusions, while requiring fewer reference images.
翻译:为实现真实世界中物体有意义的机器人操作,6D姿态估计是关键环节之一。大多数现有方法难以将预测扩展到需要持续引入新型物体实例的场景,尤其是在严重遮挡情况下。本文提出一种名为SA6D的小样本姿态估计方法,该方法利用自适应分割模块识别新型目标物体,并仅通过少量杂乱的参考图像构建目标物体的点云模型。与现有方法不同,SA6D无需以物体为中心的参考图像或任何额外物体信息,从而成为跨类别更具泛化性和可扩展性的解决方案。我们在真实桌面物体数据集上评估了SA6D,结果表明:SA6D在需要更少参考图像的情况下,尤其对存在遮挡的杂乱场景,性能优于现有小样本姿态估计方法。