Street-view imagery provides us with novel experiences to explore different places remotely. Carefully calibrated street-view images (e.g. Google Street View) can be used for different downstream tasks, e.g. navigation, map features extraction. As personal high-quality cameras have become much more affordable and portable, an enormous amount of crowdsourced street-view images are uploaded to the internet, but commonly with missing or noisy sensor information. To prepare this hidden treasure for "ready-to-use" status, determining missing location information and camera orientation angles are two equally important tasks. Recent methods have achieved high performance on geo-localization of street-view images by cross-view matching with a pool of geo-referenced satellite imagery. However, most of the existing works focus more on geo-localization than estimating the image orientation. In this work, we re-state the importance of finding fine-grained orientation for street-view images, formally define the problem and provide a set of evaluation metrics to assess the quality of the orientation estimation. We propose two methods to improve the granularity of the orientation estimation, achieving 82.4% and 72.3% accuracy for images with estimated angle errors below 2 degrees for CVUSA and CVACT datasets, corresponding to 34.9% and 28.2% absolute improvement compared to previous works. Integrating fine-grained orientation estimation in training also improves the performance on geo-localization, giving top 1 recall 95.5%/85.5% and 86.8%/80.4% for orientation known/unknown tests on the two datasets.
翻译:街景图像为我们远程探索不同地点提供了新体验。经过精确校准的街景图像(如谷歌街景)可用于导航、地图特征提取等多种下游任务。随着个人高质量相机变得日益经济便携,大量众包街景图像被上传至互联网,但这些图像通常缺少或含有噪声的传感器信息。为使这一"隐藏宝藏"达到"即用就绪"状态,确定缺失的位置信息和相机朝向角度是两项同等重要的任务。近年方法通过将街景图像与地理参考卫星图像池进行跨视角匹配,已在地理定位任务上取得高性能。然而,现有工作多聚焦于地理定位,而非图像朝向估计。本研究重新强调了街景图像细粒度朝向估计的重要性,正式定义该问题并建立评估指标以衡量朝向估计质量。我们提出两种方法提升朝向估计的粒度,在CVUSA和CVACT数据集中,对于估计角度误差低于2度的图像,分别实现82.4%和72.3%的准确率,较先前工作绝对提升34.9%和28.2%。将细粒度朝向估计整合至训练过程还可提升地理定位性能,在两个数据集的朝向已知/未知测试中,top-1召回率分别达到95.5%/85.5%和86.8%/80.4%。