Matching landmark patches from a real-time image captured by an on-vehicle camera with landmark patches in an image database plays an important role in various computer perception tasks for autonomous driving. Current methods focus on local matching for regions of interest and do not take into account spatial neighborhood relationships among the image patches, which typically correspond to objects in the environment. In this paper, we construct a spatial graph with the graph vertices corresponding to patches and edges capturing the spatial neighborhood information. We propose a joint feature and metric learning model with graph-based learning. We provide a theoretical basis for the graph-based loss by showing that the information distance between the distributions conditioned on matched and unmatched pairs is maximized under our framework. We evaluate our model using several street-scene datasets and demonstrate that our approach achieves state-of-the-art matching results.
翻译:从车载摄像头实时捕获的图像中匹配地标图像块与图像数据库中的地标图像块,在自动驾驶的各种计算机感知任务中扮演着重要角色。现有方法主要关注感兴趣区域的局部匹配,而未考虑图像块之间的空间邻域关系,而这些关系通常对应于环境中的物体。本文构建了一个空间图,其中图顶点对应图像块,边捕获空间邻域信息。我们提出了一种结合图学习的联合特征与度量学习模型,并通过证明在匹配与未匹配对条件下分布的的信息距离在我们的框架下达到最大化,为基于图的损失函数提供了理论基础。我们使用多个街头场景数据集评估模型,结果表明我们的方法达到了最先进的匹配性能。