6D pose estimation of textureless objects is a valuable but challenging task for many robotic applications. In this work, we propose a framework to address this challenge using only RGB images acquired from multiple viewpoints. The core idea of our approach is to decouple 6D pose estimation into a sequential two-step process, first estimating the 3D translation and then the 3D rotation of each object. This decoupled formulation first resolves the scale and depth ambiguities in single RGB images, and uses these estimates to accurately identify the object orientation in the second stage, which is greatly simplified with an accurate scale estimate. Moreover, to accommodate the multi-modal distribution present in rotation space, we develop an optimization scheme that explicitly handles object symmetries and counteracts measurement uncertainties. In comparison to the state-of-the-art multi-view approach, we demonstrate that the proposed approach achieves substantial improvements on a challenging 6D pose estimation dataset for textureless objects.
翻译:无纹理物体的6D位姿估计是许多机器人应用中具有价值但极具挑战性的任务。本文提出一种仅利用多视角RGB图像应对该挑战的框架。其核心思想是将6D位姿估计解耦为顺序执行的两步过程:首先估算每个物体的3D平移量,再确定其3D旋转量。这种解耦方法首先消除了单一RGB图像中存在的尺度与深度歧义,并利用所获估计值在第二阶段准确辨识物体朝向——借助精确的尺度估计,该阶段复杂度显著降低。此外,为适应旋转空间中存在的多模态分布,我们开发了一种优化方案,可显式处理物体对称性并抑制测量不确定性。与现有最先进的多视角方法相比,本文方法在面向无纹理物体的挑战性6D位姿估计数据集上取得了显著性能提升。