In order to meaningfully interact with the world, robot manipulators must be able to interpret objects they encounter. A critical aspect of this interpretation is pose estimation: inferring quantities that describe the position and orientation of an object in 3D space. Most existing approaches to pose estimation make limiting assumptions, often working only for specific, known object instances, or at best generalising to an object category using large pose-labelled datasets. In this work, we present a method for achieving category-level pose estimation by inspection of just a single object from a desired category. We show that we can subsequently perform accurate pose estimation for unseen objects from an inspected category, and considerably outperform prior work by exploiting multi-view correspondences. We demonstrate that our method runs in real-time, enabling a robot manipulator equipped with an RGBD sensor to perform online 6D pose estimation for novel objects. Finally, we showcase our method in a continual learning setting, with a robot able to determine whether objects belong to known categories, and if not, use active perception to produce a one-shot category representation for subsequent pose estimation.
翻译:为与世界进行有意义的交互,机器人操作器必须能够解读所遇到的物体。其中关键环节是位姿估计:推断描述物体在三维空间中位置与方向的量。现有大多数位姿估计方法存在限制性假设,通常仅适用于特定已知物体实例,或最多通过大规模带位姿标签数据集泛化至物体类别。本研究提出一种方法,仅通过检查目标类别中的单个物体即可实现类别级位姿估计。研究表明,该方法能对检查类别中的未见物体进行精确位姿估计,并通过利用多视角对应关系显著优于先前工作。我们证明该方法可实时运行,使配备RGBD传感器的机器人操作器能够对未知物体进行在线六自由度位姿估计。最后,我们在持续学习场景中展示该方法:机器人既能判断物体是否属于已知类别,也能在无法识别时,通过主动感知生成一次性类别表征以进行后续位姿估计。