A Survey on Monocular Re-Localization: From the Perspective of Scene Map Representation

Monocular Re-Localization (MRL) is a critical component in numerous autonomous applications, which estimates 6 degree-of-freedom poses with regards to the scene map based on a single monocular image. In recent decades, significant progress has been made in the development of MRL techniques. Numerous landmark algorithms have accomplished extraordinary success in terms of localization accuracy and robustness against visual interference. In MRL research, scene maps are represented in various forms, and they determine how MRL methods work and even how MRL methods perform. However, to the best of our knowledge, existing surveys do not provide systematic reviews of MRL from the respective of map. This survey fills the gap by comprehensively reviewing MRL methods employing monocular cameras as main sensors, promoting further research. 1) We commence by delving into the problem definition of MRL and exploring current challenges, while also comparing ours with with previous published surveys. 2) MRL methods are then categorized into five classes according to the representation forms of utilized map, i.e., geo-tagged frames, visual landmarks, point clouds, and vectorized semantic map, and we review the milestone MRL works of each category. 3) To quantitatively and fairly compare MRL methods with various map, we also review some public datasets and provide the performances of some typical MRL methods. The strengths and weakness of different types of MRL methods are analyzed. 4) We finally introduce some topics of interest in this field and give personal opinions. This survey can serve as a valuable referenced materials for newcomers and researchers interested in MRL, and a continuously updated summary of this survey, including reviewed papers and datasets, is publicly available to the community at: https://github.com/jinyummiao/map-in-mono-reloc.

翻译：单目重定位（Monocular Re-Localization, MRL）是诸多自主应用中的关键组成部分，其基于单张单目图像估计相对于场景地图的六自由度位姿。近几十年来，MRL技术取得了显著进展，大量里程碑式算法在定位精度和抗视觉干扰鲁棒性方面取得了卓越成就。在MRL研究中，场景地图以多种形式表示，且这些表示形式决定了MRL方法的工作方式甚至性能表现。然而，据我们所知，现有综述尚未从地图视角对MRL进行系统性梳理。本文填补了这一空白，通过全面综述以单目相机为主传感器的MRL方法，推动该领域的进一步研究：1) 首先深入探讨MRL的问题定义并剖析当前面临的挑战，同时与已发表的综述进行对比；2) 根据所使用地图的表示形式（即地理标记帧、视觉地标、点云和矢量化语义地图），将MRL方法划分为五类，并回顾各类别的里程碑式工作；3) 为公平定量比较采用不同地图的MRL方法，本文还梳理了部分公开数据集，并报告了典型MRL方法的性能表现，分析了各类方法的优劣势；4) 最后介绍该领域的热点课题并给出个人见解。本综述可作为MRL领域新研究者与资深学者的重要参考资料，其持续更新的论文与数据集汇总已在社区公开：https://github.com/jinyummiao/map-in-mono-reloc。