Object pose estimation is a core perception task that enables, for example, object grasping and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved.
翻译:物体姿态估计是一项核心感知任务,可支持物体抓取与场景理解等应用。广泛可用、低成本且高分辨率的RGB传感器以及基于该模态实现快速推理的CNN,使单目方法特别适用于机器人应用。我们注意到,先前关于物体姿态估计的综述已在不同模态、单视角与多视角设置、以及涵盖多种应用场景的数据集和指标方面确立了最新技术水平。然而,我们认为这些工作的广泛视角掩盖了单目方法中特有的开放挑战,以及其在机器人应用中衍生出的有前景的未来方向。通过统一审视机器人学与计算机视觉领域的最新文献,我们发现遮挡处理、新型姿态表示、以及类别级姿态估计的形式化与改进,仍是与机器人高度相关的基础性挑战。此外,为了进一步提升机器人性能,大型物体集、新颖物体、折射材料以及不确定性估计是核心且尚未解决的开放挑战。为解决这些问题,需要在本体推理、可变形物体处理、场景级推理、真实感数据集以及算法的生态足迹等方面取得改进。