Single-view depth estimation can be remarkably effective if there is enough ground-truth depth data for supervised training. However, there are scenarios, especially in medicine in the case of endoscopies, where such data cannot be obtained. In such cases, multi-view self-supervision and synthetic-to-real transfer serve as alternative approaches, however, with a considerable performance reduction in comparison to supervised case. Instead, we propose a single-view self-supervised method that achieves a performance similar to the supervised case. In some medical devices, such as endoscopes, the camera and light sources are co-located at a small distance from the target surfaces. Thus, we can exploit that, for any given albedo and surface orientation, pixel brightness is inversely proportional to the square of the distance to the surface, providing a strong single-view self-supervisory signal. In our experiments, our self-supervised models deliver accuracies comparable to those of fully supervised ones, while being applicable without depth ground-truth data.
翻译:单视角深度估计若能获得足够的真实深度数据进行监督训练,可取得显著效果。然而在某些场景(尤其是内窥镜等医疗影像)中,此类数据难以获取。针对该问题,多视角自监督方法和合成域到真实域的迁移学习可作为替代方案,但其性能相比监督学习仍有明显差距。为此,我们提出一种单视角自监督方法,其性能可与监督学习相媲美。在内窥镜等医疗设备中,相机与光源相距极近且紧贴目标表面。基于此特性,我们发现:对任意给定的反照率和表面朝向而言,像素亮度与距目标表面距离的平方呈反比关系,这为单视角自监督提供了强有力的信号。实验表明,我们的自监督模型在无需深度真值数据的前提下,其精度可达到与全监督模型相当的水平。