Depth estimation from light field (LF) images is a fundamental step for numerous applications. Recently, learning-based methods have achieved higher accuracy and efficiency than the traditional methods. However, it is costly to obtain sufficient depth labels for supervised training. In this paper, we propose an unsupervised framework to estimate depth from LF images. First, we design a disparity estimation network (DispNet) with a coarse-to-fine structure to predict disparity maps from different view combinations. It explicitly performs multi-view feature matching to learn the correspondences effectively. As occlusions may cause the violation of photo-consistency, we introduce an occlusion prediction network (OccNet) to predict the occlusion maps, which are used as the element-wise weights of photometric loss to solve the occlusion issue and assist the disparity learning. With the disparity maps estimated by multiple input combinations, we then propose a disparity fusion strategy based on the estimated errors with effective occlusion handling to obtain the final disparity map with higher accuracy. Experimental results demonstrate that our method achieves superior performance on both the dense and sparse LF images, and also shows better robustness and generalization on the real-world LF images compared to the other methods.
翻译:光场(LF)图像的深度估计是众多应用的基础步骤。近年来,基于学习的方法在精度和效率上均超越了传统方法。然而,获取充足的深度真值用于监督学习代价高昂。本文提出一种无监督框架来估计光场图像的深度。首先,我们设计了一个具有由粗到细结构的视差估计网络(DispNet),从不同视角组合预测视差图。该网络显式地进行多视图特征匹配以有效学习对应关系。由于遮挡可能违反光度一致性,我们引入一个遮挡预测网络(OccNet)来预测遮挡图,并将其作为光度损失的元素级权重,以解决遮挡问题并辅助视差学习。基于多种输入组合估计的视差图,我们提出一种基于估计误差的视差融合策略,同时有效处理遮挡,以获取更高精度的最终视差图。实验结果表明,我们的方法在密集和稀疏光场图像上均取得了优越性能,且在真实光场图像上相比其他方法展现出更强的鲁棒性和泛化能力。