Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependency on labeled training data, model compactness, robustness under challenging conditions, and their ability to generalize to novel unseen objects. A recent survey discussing the progress made on different aspects of this area, outstanding challenges, and promising future directions, is missing. To fill this gap, we discuss the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, \emph{i.e.}, instance-level, category-level, and unseen object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks, providing the readers with a holistic understanding of this field. Additionally, it discusses training paradigms of different domains, inference modes, application areas, evaluation metrics, and benchmark datasets, as well as reports the performance of current state-of-the-art methods on these benchmarks, thereby facilitating the readers in selecting the most suitable method for their application. Finally, the survey identifies key challenges, reviews the prevailing trends along with their pros and cons, and identifies promising directions for future research. We also keep tracing the latest works at https://github.com/CNJianLiu/Awesome-Object-Pose-Estimation.

翻译：物体姿态估计是计算机视觉领域的一个基础问题，在增强现实和机器人技术中具有广泛的应用。在过去的十年中，深度学习模型凭借其卓越的准确性和鲁棒性，已逐渐取代了依赖人工设计的点对特征的传统算法。然而，当代方法仍存在若干挑战，包括对标注训练数据的依赖、模型的紧凑性、在挑战性条件下的鲁棒性，以及泛化到未见过的物体的能力。目前尚缺乏一篇综述来讨论该领域不同方面的进展、存在的挑战以及有前景的未来方向。为填补这一空白，本文讨论了基于深度学习的物体姿态估计的最新进展，涵盖了该问题的所有三种表述形式，即实例级、类别级以及未见物体姿态估计。我们的综述还涵盖了多种输入数据模态、输出姿态的自由度、物体属性以及下游任务，为读者提供了对该领域的整体理解。此外，本文讨论了不同领域的训练范式、推理模式、应用领域、评估指标和基准数据集，并报告了当前最先进方法在这些基准上的性能，从而帮助读者为其应用选择最合适的方法。最后，本文指出了关键挑战，回顾了主流趋势及其优缺点，并确定了未来研究的有前景的方向。我们还在 https://github.com/CNJianLiu/Awesome-Object-Pose-Estimation 持续追踪最新工作。