In this paper, we present SPVLoc, a global indoor localization method that accurately determines the six-dimensional (6D) camera pose of a query image and requires minimal scene-specific prior knowledge and no scene-specific training. Our approach employs a novel matching procedure to localize the perspective camera's viewport, given as an RGB image, within a set of panoramic semantic layout representations of the indoor environment. The panoramas are rendered from an untextured 3D reference model, which only comprises approximate structural information about room shapes, along with door and window annotations. We demonstrate that a straightforward convolutional network structure can successfully achieve image-to-panorama and ultimately image-to-model matching. Through a viewport classification score, we rank reference panoramas and select the best match for the query image. Then, a 6D relative pose is estimated between the chosen panorama and query image. Our experiments demonstrate that this approach not only efficiently bridges the domain gap but also generalizes well to previously unseen scenes that are not part of the training data. Moreover, it achieves superior localization accuracy compared to the state of the art methods and also estimates more degrees of freedom of the camera pose. We will make our source code publicly available at https://github.com/fraunhoferhhi/spvloc .
翻译:本文提出SPVLoc,一种全局室内定位方法,能精确确定查询图像的六维(6D)相机位姿,且仅需极少的场景特定先验知识,无需场景特定训练。该方法采用一种新颖的匹配流程,将给定RGB图像的透视相机视口定位到室内环境的多组全景语义布局表示中。全景图从未带纹理的3D参考模型渲染而来,该模型仅包含房间形状的近似结构信息以及门窗标注。我们证明,简单的卷积网络结构能够成功实现图像到全景图乃至图像到模型的匹配。通过视口分类得分对参考全景图进行排序,并为查询图像选择最佳匹配。随后,在所选全景图与查询图像之间估计6D相对位姿。实验表明,该方法不仅能高效弥合域差距,还能良好泛化至训练数据中未包含的全新场景。此外,与现有最优方法相比,它在定位精度上更胜一筹,并能估计相机位姿的更多自由度。我们将于https://github.com/fraunhoferhhi/spvloc 公开源代码。