Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose. However, the general object detection methods they use are ill-suited to handle cluttered scenes, thus producing poor initialization to the subsequent pose network. To address this, we propose a rigidity-aware detection method exploiting the fact that, in 6D pose estimation, the target objects are rigid. This lets us introduce an approach to sampling positive object regions from the entire visible object area during training, instead of naively drawing samples from the bounding box center where the object might be occluded. As such, every visible object part can contribute to the final bounding box prediction, yielding better detection robustness. Key to the success of our approach is a visibility map, which we propose to build using a minimum barrier distance between every pixel in the bounding box and the box boundary. Our results on seven challenging 6D pose estimation datasets evidence that our method outperforms general detection frameworks by a large margin. Furthermore, combined with a pose regression network, we obtain state-of-the-art pose estimation results on the challenging BOP benchmark.
翻译:最新的大多数六维物体姿态估计方法首先使用目标检测获取二维边界框,再进行实际的姿态回归。然而,这些方法所采用的通用目标检测算法难以有效应对杂乱场景,导致为后续姿态网络提供较差的初始化。为解决这一问题,我们提出一种利用六维姿态估计中目标物体具有刚性这一特性的刚性感知检测方法。该方法在训练过程中,能够从整个可见物体区域采样正例区域,而非简单地以可能被遮挡的边界框中心进行采样。这样,每个可见物体部分都能为最终边界框预测做出贡献,从而提升检测鲁棒性。该方法成功的关键在于我们提出的可见性映射图,该映射图通过计算边界框内每个像素与边界框边界之间的最小障碍距离构建。在七个具有挑战性的六维姿态估计数据集上的实验结果表明,我们的方法大幅优于通用检测框架。此外,结合姿态回归网络,我们在具有挑战性的BOP基准测试上取得了最先进的姿态估计结果。