Mapping and localization in endoluminal cavities from colonoscopies or gastroscopies has to overcome the challenge of significant shape and illumination changes between reobservations of the same endoluminal location. Instead of geometrical maps that strongly rely on a fixed scene geometry, topological maps are more adequate because they focus on visual place recognition, i.e. the capability to determine if two video shots are imaging the same location. We propose a topological mapping and localization system able to operate on real human colonoscopies. The map is a graph where each node codes a colon location by a set of real images of that location. The edges represent traversability between two nodes. For close-in-time images, where scene changes are minor, place recognition can be successfully managed with the recent transformers-based image-matching algorithms. However, under long-term changes --such as different colonoscopies of the same patient-- feature-based matching fails. To address this, we propose a GeM global descriptor able to achieve high recall with significant changes in the scene. The addition of a Bayesian filter processing the map graph boosts the accuracy of the long-term place recognition, enabling relocalization in a previously built map. In the experiments, we construct a map during the withdrawal phase of a first colonoscopy. Subsequently, we prove the ability to relocalize within this map during a second colonoscopy of the same patient two weeks later. Code and models will be available upon acceptance.
翻译:在结肠镜或胃镜的内腔环境中进行地图构建与定位,必须克服同一内腔位置在重观察时出现的显著形状与光照变化挑战。相较于强烈依赖固定场景几何结构的几何地图,拓扑地图更为适用,因其聚焦于视觉位置识别——即判定两个视频片段是否成像同一位置的能力。我们提出了一种能在真实人体结肠镜中运行的拓扑地图构建与定位系统。该地图以图结构呈现,每个节点通过该位置的一组真实图像编码结肠位置,边表示两节点间的可通行性。对于时间相近的图像(场景变化较小),基于现代Transformer的图像匹配算法可有效管理位置识别。然而,在长期变化(如同一患者的不同结肠镜检查)下,基于特征的匹配会失效。为此,我们提出了一种GeM全局描述子,能在场景发生重大变化时实现高召回率。通过引入处理地图图的贝叶斯滤波器,提升了长期位置识别的准确性,从而能够在先前构建的地图中重新定位。在实验中,我们于第一次结肠镜的退出阶段构建了地图,随后证明了两周后同一患者第二次结肠镜检查时在该地图中重新定位的能力。代码与模型将在论文接收后公开。