Monocular depth estimation is critical for endoscopists to perform spatial perception and 3D navigation of surgical sites. However, most of the existing methods ignore the important geometric structural consistency, which inevitably leads to performance degradation and distortion of 3D reconstruction. To address this issue, we introduce a gradient loss to penalize edge fluctuations ambiguous around stepped edge structures and a normal loss to explicitly express the sensitivity to frequently small structures, and propose a geometric consistency loss to spreads the spatial information across the sample grids to constrain the global geometric anatomy structures. In addition, we develop a synthetic RGB-Depth dataset that captures the anatomical structures under reflections and illumination variations. The proposed method is extensively validated across different datasets and clinical images and achieves mean RMSE values of 0.066 (stomach), 0.029 (small intestine), and 0.139 (colon) on the EndoSLAM dataset. The generalizability of the proposed method achieves mean RMSE values of 12.604 (T1-L1), 9.930 (T2-L2), and 13.893 (T3-L3) on the ColonDepth dataset. The experimental results show that our method exceeds previous state-of-the-art competitors and generates more consistent depth maps and reasonable anatomical structures. The quality of intraoperative 3D structure perception from endoscopic videos of the proposed method meets the accuracy requirements of video-CT registration algorithms for endoscopic navigation. The dataset and the source code will be available at https://github.com/YYM-SIA/LINGMI-MR.
翻译:单目深度估计对于内镜医生进行手术部位的空间感知和三维导航至关重要。然而,现有方法大多忽略了重要的几何结构一致性,不可避免地导致性能下降和三维重建的变形。针对这一问题,我们引入梯度损失以惩罚阶梯状边缘结构周围的模糊波动,引入法向损失以明确表达对频繁微小结构的敏感性,并提出几何一致性损失以将空间信息扩散至采样网格,从而约束全局几何解剖结构。此外,我们开发了一个合成RGB-深度数据集,该数据集在反射和光照变化条件下捕捉解剖结构。所提方法在不同数据集和临床图像上进行了广泛验证,在EndoSLAM数据集上实现了平均均方根误差值0.066(胃)、0.029(小肠)和0.139(结肠)。其泛化能力在ColonDepth数据集上实现了平均均方根误差值12.604(T1-L1)、9.930(T2-L2)和13.893(T3-L3)。实验结果表明,我们的方法超越了此前最先进的竞争对手,生成了更一致的深度图和合理的解剖结构。所提方法从内窥镜视频中获取的术中三维结构感知质量满足内镜导航中视频-CT配准算法的精度要求。数据集和源代码将在https://github.com/YYM-SIA/LINGMI-MR.提供。