We introduce Flatlandia, a novel problem for visual localization of an image from object detections composed of two specific tasks: i) Coarse Map Localization: localizing a single image observing a set of objects in respect to a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimating latitude, longitude, and orientation of the image within a 2D map. Solutions for these new tasks exploit the wide availability of open urban maps annotated with GPS locations of common objects (\eg via surveying or crowd-sourced). Such maps are also more storage-friendly than standard large-scale 3D models often used in visual localization while additionally being privacy-preserving. As existing datasets are unsuited for the proposed problem, we provide the Flatlandia dataset, designed for 3DoF visual localization in multiple urban settings and based on crowd-sourced data from five European cities. We use the Flatlandia dataset to validate the complexity of the proposed tasks.
翻译:我们提出Flatlandia这一新问题,旨在通过目标检测实现图像的视觉定位,包含两个具体任务:(i)粗粒度地图定位:根据观测到的目标集合,相对于由地标目标构成的二维地图对单张图像进行定位;(ii)细粒度三自由度定位:在二维地图中估计图像的经纬度与朝向。这些新任务的解决方案利用了开放城市地图的广泛可用性,此类地图通过测量或众包标注了常见目标的GPS坐标。相比视觉定位中常用的大规模三维模型,这类地图更节省存储空间,同时具有隐私保护特性。由于现有数据集不适用于所提出的问题,我们提供了Flatlandia数据集,该数据集基于五个欧洲城市众包数据设计,支持多种城市场景下的三自由度视觉定位。我们通过Flatlandia数据集验证了所提任务的复杂性。