Unlike in natural images, in endoscopy there is no clear notion of an up-right camera orientation. Endoscopic videos therefore often contain large rotational motions, which require keypoint detection and description algorithms to be robust to these conditions. While most classical methods achieve rotation-equivariant detection and invariant description by design, many learning-based approaches learn to be robust only up to a certain degree. At the same time learning-based methods under moderate rotations often outperform classical approaches. In order to address this shortcoming, in this paper we propose RIDE, a learning-based method for rotation-equivariant detection and invariant description. Following recent advancements in group-equivariant learning, RIDE models rotation-equivariance implicitly within its architecture. Trained in a self-supervised manner on a large curation of endoscopic images, RIDE requires no manual labeling of training data. We test RIDE in the context of surgical tissue tracking on the SuPeR dataset as well as in the context of relative pose estimation on a repurposed version of the SCARED dataset. In addition we perform explicit studies showing its robustness to large rotations. Our comparison against recent learning-based and classical approaches shows that RIDE sets a new state-of-the-art performance on matching and relative pose estimation tasks and scores competitively on surgical tissue tracking.
翻译:摘要:与自然图像不同,内窥镜图像中不存在明确的相机垂直方向概念。内窥镜视频通常包含大幅度旋转运动,这要求关键点检测与描述算法对这种条件具有鲁棒性。尽管大多数经典方法在设计上实现了旋转等变检测与不变描述,但许多基于学习的方法仅能在特定程度内习得鲁棒性。与此同时,在中等旋转条件下,基于学习的方法往往优于经典方法。为弥补这一不足,本文提出RIDE——一种基于学习的旋转等变检测与不变描述方法。遵循群等变学习的最新进展,RIDE在其架构中隐式建模了旋转等变性。该方法通过自监督方式在大规模内窥镜图像数据集上训练,无需人工标注训练数据。我们在SuPeR数据集上以手术组织追踪为任务,以及在改进版SCARED数据集上以相对位姿估计为任务对RIDE进行测试,并专门开展实验验证其对大角度旋转的鲁棒性。与近期基于学习及经典方法的对比表明,RIDE在匹配与相对位姿估计任务上达到了新的最优性能,并在手术组织追踪任务中取得具有竞争力的评分。