Estimating the 6D pose of objects accurately, quickly, and robustly remains a difficult task. However, recent methods for directly regressing poses from RGB images using dense features have achieved state-of-the-art results. Stereo vision, which provides an additional perspective on the object, can help reduce pose ambiguity and occlusion. Moreover, stereo can directly infer the distance of an object, while mono-vision requires internalized knowledge of the object's size. To extend the state-of-the-art in 6D object pose estimation to stereo, we created a BOP compatible stereo version of the YCB-V dataset. Our method outperforms state-of-the-art 6D pose estimation algorithms by utilizing stereo vision and can easily be adopted for other dense feature-based algorithms.
翻译:精确、快速且鲁棒地估计物体的六维姿态仍是一项艰巨任务。然而,近年来基于密集特征从RGB图像直接回归姿态的方法已取得最新成果。立体视觉通过提供物体的额外视角,有助于减少姿态模糊性和遮挡问题。此外,立体视觉可直接推断物体的距离,而单目视觉需依赖对物体尺寸的内在认识。为将六维物体姿态估计的最新成果扩展至立体视觉,我们创建了兼容BOP的YCB-V数据集立体版本。通过利用立体视觉,我们的方法优于现有六维姿态估计算法,并易于推广至其他基于密集特征的算法。