Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community. Although several self-supervision methods alleviate the need for ground truth depth maps and poses, they still need known camera intrinsic parameters, which are often missing or not recorded. Moreover, the camera intrinsic prediction methods in existing works depend heavily on the quality of datasets. In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters. We proposed a cost-volume-based supervision manner to give the system auxiliary supervision for camera parameters prediction. The experimental results showed that the proposed method improved the accuracy of estimated camera parameters, ego-motion, and depth estimation.
翻译:手术视频中的深度估计在许多图像引导手术过程中起着关键作用。然而,由于手术场景中存在亮度不一致和噪声等问题,创建手术视频的深度图真值数据集既困难又耗时。因此,构建一个准确且鲁棒的自监督深度与相机自运动估计系统正日益受到计算机视觉领域的关注。尽管已有多种自监督方法减轻了对深度图真值与位姿的依赖,但这些方法仍需已知的相机内参,而内参信息常缺失或未被记录。此外,现有工作中的相机内参预测方法高度依赖数据集质量。本研究旨在构建一个自监督深度与自运动估计系统,使其不仅能预测准确的深度图与相机位姿,还能预测相机内参。我们提出了一种基于代价体积的监督方法,为系统提供相机参数预测的辅助监督。实验结果表明,所提方法提升了估计相机参数、自运动及深度估计的准确性。