Depth estimation is a crucial step for image-guided intervention in robotic surgery and laparoscopic imaging system. Since per-pixel depth ground truth is difficult to acquire for laparoscopic image data, it is rarely possible to apply supervised depth estimation to surgical applications. As an alternative, self-supervised methods have been introduced to train depth estimators using only synchronized stereo image pairs. However, most recent work focused on the left-right consistency in 2D and ignored valuable inherent 3D information on the object in real world coordinates, meaning that the left-right 3D geometric structural consistency is not fully utilized. To overcome this limitation, we present M3Depth, a self-supervised depth estimator to leverage 3D geometric structural information hidden in stereo pairs while keeping monocular inference. The method also removes the influence of border regions unseen in at least one of the stereo images via masking, to enhance the correspondences between left and right images in overlapping areas. Intensive experiments show that our method outperforms previous self-supervised approaches on both a public dataset and a newly acquired dataset by a large margin, indicating a good generalization across different samples and laparoscopes. Code and data are available at https://github.com/br0202/M3Depth.
翻译:深度估计是机器人手术和腹腔镜成像系统中图像引导干预的关键步骤。由于腹腔镜图像数据难以获取逐像素深度真值,将监督式深度估计应用于外科手术场景几乎不可行。作为替代方案,自监督方法已被引入,仅通过同步立体图像对训练深度估计器。然而,近期研究大多关注二维空间中的左右一致性,忽略了现实世界坐标系中物体固有的三维信息价值,这意味着左右视图的三维几何结构一致性尚未得到充分利用。为突破这一局限,我们提出M3Depth——一种自监督深度估计算法,在保持单目推理能力的同时,充分挖掘立体图像对中隐藏的三维几何结构信息。该方法还通过掩膜技术移除至少一侧立体图像中不可见的边缘区域,以增强左右视图重叠区域的对应关系。大量实验表明,我们的方法在公开数据集和新采集数据集上均显著优于现有自监督方法,展现出跨不同样本与腹腔镜的良好泛化性能。代码与数据可通过https://github.com/br0202/M3Depth获取。