Relative monocular depth, inferring depth up to shift and scale from a single image, is an active research topic. Recent deep learning models, trained on large and varied meta-datasets, now provide excellent performance in the domain of natural images. However, few datasets exist which provide ground truth depth for endoscopic images, making training such models from scratch unfeasible. This work investigates the transfer of these models into the surgical domain, and presents an effective and simple way to improve on standard supervision through the use of temporal consistency self-supervision. We show temporal consistency significantly improves supervised training alone when transferring to the low-data regime of endoscopy, and outperforms the prevalent self-supervision technique for this task. In addition we show our method drastically outperforms the state-of-the-art method from within the domain of endoscopy. We also release our code, model and ensembled meta-dataset, Meta-MED, establishing a strong benchmark for future work.
翻译:相对单目深度(从单张图像推断出具有位移和尺度不确定性的深度)是当前活跃的研究课题。基于大规模多样化元数据集训练的现代深度学习模型,已在自然图像领域展现出卓越性能。然而,目前为内窥镜图像提供真实深度标注的数据集极为匮乏,使得从零训练此类模型不可行。本研究探索了将这些模型迁移至手术领域的可能性,并提出一种通过时域一致性自监督来增强标准监督学习的简洁有效方法。实验表明,在向低数据量的内窥镜场景迁移时,时域一致性显著提升了单独采用监督训练的效果,且优于当前主流自监督技术。此外,本方法大幅超越了内窥镜领域现有最优方法。我们同时公开了代码、模型及整合元数据集Meta-MED,为后续研究建立了强基准。