Recent works on 6D object pose estimation focus on learning keypoint correspondences between images and object models, and then determine the object pose through RANSAC-based algorithms or by directly regressing the pose with end-to-end optimisations. We argue that learning point-level discriminative features is overlooked in the literature. To this end, we revisit Fully Convolutional Geometric Features (FCGF) and tailor it for object 6D pose estimation to achieve state-of-the-art performance. FCGF employs sparse convolutions and learns point-level features using a fully-convolutional network by optimising a hardest contrastive loss. We can outperform recent competitors on popular benchmarks by adopting key modifications to the loss and to the input data representations, by carefully tuning the training strategies, and by employing data augmentations suitable for the underlying problem. We carry out a thorough ablation to study the contribution of each modification.
翻译:近年来,针对6D物体姿态估计的研究主要集中于学习图像与物体模型之间的关键点对应关系,进而通过基于RANSAC的算法或采用端到端优化的直接回归方法确定物体姿态。我们认为,现有文献中忽视了逐点判别性特征的学习。为此,我们重新审视了全卷积几何特征(FCGF),并将其应用于物体6D姿态估计,以实现最先进的性能。FCGF采用稀疏卷积,并通过优化最难对比损失,利用全卷积网络学习逐点特征。通过对损失函数和输入数据表示进行关键性改进、仔细调整训练策略,以及采用适合底层问题的数据增强方法,我们能够在主流基准测试中超越近期竞争对手。我们还进行了全面的消融实验,以研究每项修改的贡献。