MeshVPR: Citywide Visual Place Recognition Using 3D Meshes

Mesh-based scene representation offers a promising direction for simplifying large-scale hierarchical visual localization pipelines, combining a visual place recognition step based on global features (retrieval) and a visual localization step based on local features. While existing work demonstrates the viability of meshes for visual localization, the impact of using synthetic databases rendered from them in visual place recognition remains largely unexplored. In this work we investigate using dense 3D textured meshes for large-scale Visual Place Recognition (VPR). We identify a significant performance drop when using synthetic mesh-based image databases compared to real-world images for retrieval. To address this, we propose MeshVPR, a novel VPR pipeline that utilizes a lightweight features alignment framework to bridge the gap between real-world and synthetic domains. MeshVPR leverages pre-trained VPR models and is efficient and scalable for city-wide deployments. We introduce novel datasets with freely available 3D meshes and manually collected queries from Berlin, Paris, and Melbourne. Extensive evaluations demonstrate that MeshVPR achieves competitive performance with standard VPR pipelines, paving the way for mesh-based localization systems. Data, code, and interactive visualizations are available at https://meshvpr.github.io/

翻译：基于网格的场景表示为简化大规模分层视觉定位流程提供了有前景的方向，该流程结合了基于全局特征的视觉地点识别步骤（检索）和基于局部特征的视觉定位步骤。虽然现有研究证明了网格在视觉定位中的可行性，但使用基于网格渲染的合成数据库对视觉地点识别的影响仍很大程度上未被探索。在本研究中，我们探究了利用稠密三维纹理网格进行大规模视觉地点识别的方法。我们发现与使用真实世界图像进行检索相比，使用基于网格的合成图像数据库会导致性能显著下降。为解决此问题，我们提出MeshVPR——一种新型视觉地点识别流程，采用轻量级特征对齐框架来弥合真实世界与合成领域之间的差距。MeshVPR利用预训练的视觉地点识别模型，在城市级部署中具有高效性和可扩展性。我们引入了包含免费可用三维网格及从柏林、巴黎和墨尔本人工采集查询数据的新数据集。大量评估表明，MeshVPR实现了与标准视觉地点识别流程相竞争的性能，为基于网格的定位系统开辟了道路。数据、代码及交互式可视化内容可通过https://meshvpr.github.io/获取。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日