In this paper, we present SPVLoc, a global indoor localization method that accurately determines the six-dimensional (6D) camera pose of a query image and requires minimal scene-specific prior knowledge and no scene-specific training. Our approach employs a novel matching procedure to localize the perspective camera's viewport, given as an RGB image, within a set of panoramic semantic layout representations of the indoor environment. The panoramas are rendered from an untextured 3D reference model, which only comprises approximate structural information about room shapes, along with door and window annotations. We demonstrate that a straightforward convolutional network structure can successfully achieve image-to-panorama and ultimately image-to-model matching. Through a viewport classification score, we rank reference panoramas and select the best match for the query image. Then, a 6D relative pose is estimated between the chosen panorama and query image. Our experiments demonstrate that this approach not only efficiently bridges the domain gap but also generalizes well to previously unseen scenes that are not part of the training data. Moreover, it achieves superior localization accuracy compared to the state of the art methods and also estimates more degrees of freedom of the camera pose. Our source code is publicly available at https://fraunhoferhhi.github.io/spvloc .
翻译:本文提出SPVLoc,一种全局室内定位方法,能够精确确定查询图像的六维(6D)相机位姿,且仅需极少场景先验知识且无需场景特异性训练。我们的方法采用新颖的匹配流程,将透视相机视口(以RGB图像形式给出)定位在室内环境的一组全景语义布局表示中。这些全景图从无纹理的3D参考模型渲染生成,该模型仅包含房间形状的近似结构信息以及门窗标注。我们证明,简单的卷积网络结构即可成功实现图像-全景图乃至图像-模型的匹配。通过视口分类评分,我们对参考全景图进行排序并选择与查询图像的最佳匹配。随后,在选定全景图与查询图像之间估计6D相对位姿。实验表明,该方法不仅能有效弥合域间差异,还能良好泛化至训练数据未涵盖的未知场景。此外,相较于现有先进方法,本方法不仅实现了更高的定位精度,还能估计更多维度的相机位姿参数。源代码已公开于 https://fraunhoferhhi.github.io/spvloc 。