Depth-from-defocus (DFD), modeling the relationship between depth and defocus pattern in images, has demonstrated promising performance in depth estimation. Recently, several self-supervised works try to overcome the difficulties in acquiring accurate depth ground-truth. However, they depend on the all-in-focus (AIF) images, which cannot be captured in real-world scenarios. Such limitation discourages the applications of DFD methods. To tackle this issue, we propose a completely self-supervised framework that estimates depth purely from a sparse focal stack. We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world. In particular, we propose (i) a more realistic setting for DFD tasks, where no depth or AIF image ground-truth is available; (ii) a novel self-supervision framework that provides reliable predictions of depth and AIF image under the challenging setting. The proposed framework uses a neural model to predict the depth and AIF image, and utilizes an optical model to validate and refine the prediction. We verify our framework on three benchmark datasets with rendered focal stacks and real focal stacks. Qualitative and quantitative evaluations show that our method provides a strong baseline for self-supervised DFD tasks.
翻译:散焦深度估计(Depth-from-defocus, DFD)通过建模深度与图像中散焦模式之间的关系,已在深度估计领域展现出良好的性能。近年来,若干自监督方法试图克服获取精确深度真值数据的困难。然而,这些方法依赖于无法在真实场景中拍摄的全聚焦(all-in-focus, AIF)图像,这一局限性阻碍了DFD方法的应用。为解决该问题,我们提出一种完全自监督框架,该框架仅通过稀疏聚焦堆栈即可估计深度。我们证明,该框架无需深度和AIF图像真值即可获得更优的预测结果,从而弥合了DFD方法的理论成功与其在现实世界应用之间的差距。具体而言,我们提出:(i)一种更贴近实际的DFD任务设定,其中不存在深度或AIF图像真值;(ii)一种新型自监督框架,可在该挑战性设定下提供可靠的深度与AIF图像预测。该框架通过神经模型预测深度与AIF图像,并利用光学模型验证与优化预测结果。我们在三个包含合成聚焦堆栈和真实聚焦堆栈的基准数据集上验证了该框架。定性与定量评估表明,本方法为自监督DFD任务提供了强基准。