Mesh-based scene representation offers a promising direction for simplifying large-scale hierarchical visual localization pipelines, combining a visual place recognition step based on global features (retrieval) and a visual localization step based on local features. While existing work demonstrates the viability of meshes for visual localization, the impact of using synthetic databases rendered from them in visual place recognition remains largely unexplored. In this work we investigate using dense 3D textured meshes for large-scale Visual Place Recognition (VPR). We identify a significant performance drop when using synthetic mesh-based image databases compared to real-world images for retrieval. To address this, we propose MeshVPR, a novel VPR pipeline that utilizes a lightweight features alignment framework to bridge the gap between real-world and synthetic domains. MeshVPR leverages pre-trained VPR models and is efficient and scalable for city-wide deployments. We introduce novel datasets with freely available 3D meshes and manually collected queries from Berlin, Paris, and Melbourne. Extensive evaluations demonstrate that MeshVPR achieves competitive performance with standard VPR pipelines, paving the way for mesh-based localization systems. Data, code, and interactive visualizations are available at https://meshvpr.github.io/
翻译:基于网格的场景表示为简化大规模分层视觉定位流程提供了有前景的方向,该流程结合了基于全局特征的视觉地点识别步骤(检索)和基于局部特征的视觉定位步骤。虽然现有研究证明了网格在视觉定位中的可行性,但使用基于网格渲染的合成数据库对视觉地点识别的影响仍很大程度上未被探索。在本研究中,我们探究了利用稠密三维纹理网格进行大规模视觉地点识别的方法。我们发现与使用真实世界图像进行检索相比,使用基于网格的合成图像数据库会导致性能显著下降。为解决此问题,我们提出MeshVPR——一种新型视觉地点识别流程,采用轻量级特征对齐框架来弥合真实世界与合成领域之间的差距。MeshVPR利用预训练的视觉地点识别模型,在城市级部署中具有高效性和可扩展性。我们引入了包含免费可用三维网格及从柏林、巴黎和墨尔本人工采集查询数据的新数据集。大量评估表明,MeshVPR实现了与标准视觉地点识别流程相竞争的性能,为基于网格的定位系统开辟了道路。数据、代码及交互式可视化内容可通过https://meshvpr.github.io/获取。