Single-view depth estimation can be remarkably effective if there is enough ground-truth depth data for supervised training. However, there are scenarios, especially in medicine in the case of endoscopies, where such data cannot be obtained. In such cases, multi-view self-supervision and synthetic-to-real transfer serve as alternative approaches, however, with a considerable performance reduction in comparison to supervised case. Instead, we propose a single-view self-supervised method that achieves a performance similar to the supervised case. In some medical devices, such as endoscopes, the camera and light sources are co-located at a small distance from the target surfaces. Thus, we can exploit that, for any given albedo and surface orientation, pixel brightness is inversely proportional to the square of the distance to the surface, providing a strong single-view self-supervisory signal. In our experiments, our self-supervised models deliver accuracies comparable to those of fully supervised ones, while being applicable without depth ground-truth data.
翻译:单视图深度估计在拥有充足真实深度数据进行监督训练时可取得显著效果。然而在医疗场景中(尤其是内窥镜检查),此类数据往往无法获取。多视图自监督方法和合成到真实迁移学习可作为替代方案,但其性能相较监督学习存在明显下降。为此,我们提出一种性能接近监督学习效果的单视图自监督方法。在内窥镜等医疗设备中,摄像头与光源以微小间距共置于目标表面附近。基于此特性,我们利用以下物理规律:对于给定反照率和表面方向,像素亮度与距表面距离的平方成反比,从而构建了强有力的单视图自监督信号。实验表明,我们提出的自监督模型在无需深度真值数据的情况下,达到了与全监督模型相当的精度。