Endoscopic surgery relies on two-dimensional views, posing challenges for surgeons in depth perception and instrument manipulation. While Monocular Visual Simultaneous Localization and Mapping (MVSLAM) has emerged as a promising solution, its implementation in endoscopic procedures faces significant challenges due to hardware limitations, such as the use of a monocular camera and the absence of odometry sensors. This study presents BodySLAM, a robust deep learning-based MVSLAM approach that addresses these challenges through three key components: CycleVO, a novel unsupervised monocular pose estimation module; the integration of the state-of-the-art Zoe architecture for monocular depth estimation; and a 3D reconstruction module creating a coherent surgical map. The approach is rigorously evaluated using three publicly available datasets (Hamlyn, EndoSLAM, and SCARED) spanning laparoscopy, gastroscopy, and colonoscopy scenarios, and benchmarked against four state-of-the-art methods. Results demonstrate that CycleVO exhibited competitive performance with the lowest inference time among pose estimation methods, while maintaining robust generalization capabilities, whereas Zoe significantly outperformed existing algorithms for depth estimation in endoscopy. BodySLAM's strong performance across diverse endoscopic scenarios demonstrates its potential as a viable MVSLAM solution for endoscopic applications.
翻译:内窥镜手术依赖二维视图,给外科医生的深度感知和器械操作带来挑战。虽然单目视觉同步定位与建图(MVSLAM)已成为一种有前景的解决方案,但其在内窥镜手术中的实施因硬件限制(如使用单目摄像头和缺乏里程计传感器)面临重大挑战。本研究提出BodySLAM,一种基于深度学习的鲁棒MVSLAM方法,通过三个关键组件应对这些挑战:CycleVO——一种新型无监督单目姿态估计模块;集成最先进的Zoe架构用于单目深度估计;以及构建连贯手术地图的三维重建模块。该方法使用涵盖腹腔镜、胃镜和结肠镜场景的三个公开数据集(Hamlyn、EndoSLAM和SCARED)进行严格评估,并与四种最先进方法进行基准测试。结果表明,CycleVO在姿态估计方法中展现出具有竞争力的性能且推理时间最短,同时保持鲁棒的泛化能力;而Zoe在内窥镜深度估计方面显著优于现有算法。BodySLAM在多种内窥镜场景中的优异表现,证明了其作为内窥镜应用可行MVSLAM解决方案的潜力。