Deducing the 3D structure of endoscopic scenes from images is exceedingly challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from their self-occluding and repetitive anatomical structure. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy, and a novel method that explicitly learns a bimodal distribution to predict the endoscope pose. Our dataset replicates real colonoscope motion and highlights the drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available. We evaluate different camera pose prediction methods and demonstrate that, when trained on our data, they generalize to real colonoscopy sequences, and our bimodal approach outperforms prior unimodal work.
翻译:从图像中推断内窥镜场景的三维结构极具挑战性。除形变与视点相关光照外,结肠等管状结构因其自遮挡和重复解剖结构而产生特殊问题。本文提出SimCol——用于结肠镜相机姿态估计的合成数据集,以及一种显式学习双模态分布以预测内窥镜姿态的新方法。该数据集复现了真实结肠镜运动轨迹,揭示了现有方法的缺陷。我们发布模拟结肠镜的18万张RGB图像及其对应深度和相机姿态,并公开基于Unity的数据生成环境。通过评估不同相机姿态预测方法,我们证明:基于本数据集训练的模型可泛化至真实结肠镜序列,且提出的双模态方法优于先前的单模态工作。