One of the fundamental problems in computer vision is the two-frame relative pose optimization problem. Primarily, two different kinds of error values are used: photometric error and re-projection error. The selection of error value is usually directly dependent on the selection of feature paradigm, photometric features, or geometric features. It is a trade-off between accuracy, robustness, and the possibility of loop closing. We investigate a third method that combines the strengths of both paradigms into a unified approach. Using densely sampled geometric feature descriptors, we replace the photometric error with a descriptor residual from a dense set of descriptors, thereby enabling the employment of sub-pixel accuracy in differential photometric methods, along with the expressiveness of the geometric feature descriptor. Experiments show that although the proposed strategy is an interesting approach that results in accurate tracking, it ultimately does not outperform pose optimization strategies based on re-projection error despite utilizing more information. We proceed to analyze the underlying reason for this discrepancy and present the hypothesis that the descriptor similarity metric is too slowly varying and does not necessarily correspond strictly to keypoint placement accuracy.
翻译:计算机视觉中的一个基本问题是双帧相对位姿优化问题。主要使用两种不同类型的误差值:光度误差与重投影误差。误差值的选择通常直接取决于特征范式的选择——光度特征或几何特征。这需要在精度、鲁棒性以及闭环可能性之间进行权衡。我们研究了一种将两种范式优势结合的统一方法。通过使用密集采样的几何特征描述符,我们将光度误差替换为来自密集描述符集的描述符残差,从而能够在差分光度方法中实现亚像素精度,同时保留几何特征描述符的表达能力。实验表明,尽管所提出的策略是一种能实现精确跟踪的有趣方法,但即使利用了更多信息,其最终性能并未超越基于重投影误差的位姿优化策略。我们进一步分析了这种差异的根本原因,并提出假设:描述符相似性度量变化过于缓慢,且未必严格对应于关键点定位精度。