We present a novel end-to-end algorithm (PoCo) for the indoor RGB-D place recognition task, aimed at identifying the most likely match for a given query frame within a reference database. The task presents inherent challenges attributed to the constrained field of view and limited range of perception sensors. We propose a new network architecture, which generalizes the recent Context of Clusters (CoCs) to extract global descriptors directly from the noisy point clouds through end-to-end learning. Moreover, we develop the architecture by integrating both color and geometric modalities into the point features to enhance the global descriptor representation. We conducted evaluations on public datasets ScanNet-PR and ARKit with 807 and 5047 scenarios, respectively. PoCo achieves SOTA performance: on ScanNet-PR, we achieve R@1 of 64.63%, a 5.7% improvement from the best-published result CGis (61.12%); on Arkit, we achieve R@1 of 45.12%, a 13.3% improvement from the best-published result CGis (39.82%). In addition, PoCo shows higher efficiency than CGis in inference time (1.75X-faster), and we demonstrate the effectiveness of PoCo in recognizing places within a real-world laboratory environment.
翻译:我们提出了一种新颖的端到端算法(PoCo),用于解决室内RGB-D地点识别任务,旨在从参考数据库中为给定查询帧找到最可能的匹配。该任务因感知传感器视场受限和感知范围有限而面临固有挑战。我们设计了一种新的网络架构,通过泛化近期提出的"上下文聚类"(Context of Clusters, CoCs)方法,直接从含噪点云中通过端到端学习提取全局描述子。此外,我们通过将颜色与几何模态融合到点云特征中,进一步增强了全局描述子的表达能力。我们在公开数据集ScanNet-PR(807个场景)和ARKit(5047个场景)上进行了评估。PoCo取得了最优性能:在ScanNet-PR上,R@1达到64.63%,较此前最佳公开结果CGis(61.12%)提升5.7%;在ARKit上,R@1达到45.12%,较CGis(39.82%)提升13.3%。此外,PoCo的推理效率较CGis提升1.75倍,并在实际实验室环境中验证了其有效识别地点的能力。