Object location priors have been shown to be critical for the standard 6D object pose estimation setting, where the training and testing objects are the same. Specifically, they can be used to initialize the 3D object translation and facilitate 3D object rotation estimation. Unfortunately, the object detectors that are used for this purpose do not generalize to unseen objects, i.e., objects from new categories at test time. Therefore, existing 6D pose estimation methods for previously-unseen objects either assume the ground-truth object location to be known, or yield inaccurate results when it is unavailable. In this paper, we address this problem by developing a method, LocPoseNet, able to robustly learn location prior for unseen objects. Our method builds upon a template matching strategy, where we propose to distribute the reference kernels and convolve them with a query to efficiently compute multi-scale correlations. We then introduce a novel translation estimator, which decouples scale-aware and scale-robust features to predict different object location parameters. Our method outperforms existing works by a large margin on LINEMOD and GenMOP. We further construct a challenging synthetic dataset, which allows us to highlight the better robustness of our method to various noise sources.
翻译:物体位置先验已被证明在标准6D物体姿态估计场景(训练与测试物体相同)中至关重要。具体来说,它们可用于初始化3D物体平移并促进3D物体旋转估计。然而,用于此目的的物体检测器无法泛化到未见物体,即测试时属于新类别的物体。因此,现有的针对先前未见物体的6D姿态估计方法要么假设真实物体位置已知,要么在位置信息缺失时产生不准确的结果。本文通过开发一种能够鲁棒学习未见物体位置先验的方法LocPoseNet来解决这一问题。我们的方法基于模板匹配策略,其中我们提出分布参考核并将其与查询进行卷积以高效计算多尺度相关性。随后引入一种新颖的平移估计器,通过解耦尺度感知与尺度鲁棒特征来预测不同的物体位置参数。我们的方法在LINEMOD和GenMOP数据集上以较大优势超越现有工作。我们进一步构建了一个具有挑战性的合成数据集,从而凸显出我们的方法对各种噪声源具有更优的鲁棒性。