We introduce Flatlandia, a novel problem for visual localization of an image from object detections composed of two specific tasks: i) Coarse Map Localization: localizing a single image observing a set of objects in respect to a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimating latitude, longitude, and orientation of the image within a 2D map. Solutions for these new tasks exploit the wide availability of open urban maps annotated with GPS locations of common objects (\eg via surveying or crowd-sourced). Such maps are also more storage-friendly than standard large-scale 3D models often used in visual localization while additionally being privacy-preserving. As existing datasets are unsuited for the proposed problem, we provide the Flatlandia dataset, designed for 3DoF visual localization in multiple urban settings and based on crowd-sourced data from five European cities. We use the Flatlandia dataset to validate the complexity of the proposed tasks.
翻译:我们提出Flatlandia这一新问题,旨在通过目标检测实现图像的视觉定位,具体包含两个子任务:i)粗粒度地图定位:根据观测到的目标集合,将单张图像定位至由地标目标构成的二维地图中;ii)细粒度三自由度定位:在二维地图中估计图像的经度、纬度和朝向。这些新任务的解决方案利用了广泛可获取的开放城市地图——此类地图标注有常见目标(如通过实地测绘或众包方式采集)的全球定位系统坐标。相较于视觉定位中常用的标准大规模三维模型,此类地图不仅存储友好,还具有隐私保护特性。由于现有数据集不适用于所提出的问题,我们提供了Flatlandia数据集,该数据集基于五个欧洲城市的众包数据设计,适用于多种城市环境下的三自由度视觉定位。我们利用Flatlandia数据集验证了所提出任务的复杂性。