COMPASS: COmpact Multi-channel Prior-map And Scene Signature for Floor-Plan-Based Visual Localization

Architectural floor plans are widely available priors which contain not only geometry but also the semantic information of the environment, yet existing localization methods largely ignore this semantic information. To address this, we present COMPASS, an algorithm that exploits both geometric and semantic priors from floor plans to estimate the pose of a robot equipped with dual fisheye cameras. Inspired by scan context descriptor from LiDAR-based place recognition, we design a multi-channel radial descriptor that encodes the geometric layout surrounding a position. From the floor plan, rays are cast in 360 azimuth bins and the results are encoded into five channels: normalized range, structural hit type (wall, window, or opening), range gradient, inverse range, and local range variance. From the image side, the same descriptor structure is populated by detecting structural elements in the fisheye imagery. As a first step toward full cross-modal matching, we present a window detection algorithm for fisheye images that uses a line segment detector to identify window frames via vertical edge clustering and brightness verification. Detected windows are projected to azimuthal bearings through the fisheye camera model, producing the hit-type channel of the visual descriptor. As a proof of concept, we generate both descriptors at a single known pose from the Hilti-Trimble SLAM Challenge 2026 dataset and demonstrate that the wall-window pattern extracted from the first frame of each camera closely matches the floor plan descriptor, validating the feasibility of cross-modal structural matching.

翻译：建筑平面图是广泛可用的先验信息，不仅包含环境几何结构，还蕴含语义信息，然而现有定位方法大多忽视了此类语义信息。为解决此问题，我们提出COMPASS算法——一种利用平面图中几何与语义先验信息，对配备双鱼眼相机的机器人进行位姿估计的方法。受激光雷达位置识别中的扫描上下文描述符启发，我们设计了一种多通道径向描述符，用于编码位置周围的几何布局。通过从平面图出发沿360个方位角区间发射射线，将探测结果编码为五个通道：归一化距离、结构命中类型（墙壁、窗户或开口）、距离梯度、逆距离和局部距离方差。在图像端，我们通过检测鱼眼图像中的结构元素来构建相同的描述符结构。作为迈向完整跨模态匹配的第一步，我们提出了一种针对鱼眼图像的窗口检测算法，该算法利用线段检测器通过垂直边缘聚类与亮度验证来识别窗框。检测到的窗口通过鱼眼相机模型投影至方位角方向，生成视觉描述符的命中类型通道。为验证概念可行性，我们在Hilti-Trimble SLAM挑战2026数据集中的单一已知位姿处生成两种描述符，实验证明从每台相机首帧提取的墙壁-窗口模式与平面图描述符高度吻合，从而验证了跨模态结构匹配的可行性。