We introduce Flatlandia, a novel problem for visual localization of an image from object detections composed of two specific tasks: i) Coarse Map Localization: localizing a single image observing a set of objects in respect to a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimating latitude, longitude, and orientation of the image within a 2D map. Solutions for these new tasks exploit the wide availability of open urban maps annotated with GPS locations of common objects (\eg via surveying or crowd-sourced). Such maps are also more storage-friendly than standard large-scale 3D models often used in visual localization while additionally being privacy-preserving. As existing datasets are unsuited for the proposed problem, we provide the Flatlandia dataset, designed for 3DoF visual localization in multiple urban settings and based on crowd-sourced data from five European cities. We use the Flatlandia dataset to validate the complexity of the proposed tasks.
翻译:我们提出了Flatlandia,一个基于目标检测实现图像视觉定位的新问题,包含两个具体任务:i) 粗粒度地图定位:根据一组物体观测,在带有物体地标的二维地图上定位单张图像;ii) 细粒度三自由度定位:估计图像在二维地图中的经纬度及朝向。这些新任务的解决方案利用了广泛可用的开放城市地图(这些地图通过测量或众包方式标注了常见物体的GPS位置)。相较于视觉定位中常用的大规模三维模型,此类地图不仅存储更为友好,还兼具隐私保护特性。鉴于现有数据集不适用于所提出的问题,我们提供了基于五个欧洲城市众包数据设计的Flatlandia数据集,用于多种城市环境下的三自由度视觉定位。我们通过Flatlandia数据集验证了所提任务的复杂度。