Light field (LF) depth estimation plays a crucial role in many LF-based applications. Existing LF depth estimation methods consider depth estimation as a regression problem, where a pixel-wise L1 loss is employed to supervise the training process. However, the disparity map is only a sub-space projection (i.e., an expectation) of the disparity distribution, which is essential for models to learn. In this paper, we propose a simple yet effective method to learn the sub-pixel disparity distribution by fully utilizing the power of deep networks, especially for LF of narrow baselines. We construct the cost volume at the sub-pixel level to produce a finer disparity distribution and design an uncertainty-aware focal loss to supervise the predicted disparity distribution toward the ground truth. Extensive experimental results demonstrate the effectiveness of our method.Our method significantly outperforms recent state-of-the-art LF depth algorithms on the HCI 4D LF Benchmark in terms of all four accuracy metrics (i.e., BadPix 0.01, BadPix 0.03, BadPix 0.07, and MSE $\times$100). The code and model of the proposed method are available at \url{https://github.com/chaowentao/SubFocal}.
翻译:光场(LF)深度估计在许多基于光场的应用中扮演着关键角色。现有光场深度估计方法将深度估计视为回归问题,采用逐像素的L1损失监督训练过程。然而,视差图仅是视差分布的子空间投影(即期望值),而该分布对模型学习至关重要。本文提出一种简单而有效的方法,通过充分利用深度网络的强大能力学习亚像素视差分布,尤其适用于窄基线光场。我们在亚像素层级构建代价体以生成更精细的视差分布,并设计了一种不确定性感知聚焦损失函数,引导预测的视差分布逼近真实分布。大量实验结果表明了本方法的有效性。在HCI 4D光场基准测试中,我们的方法在所有四项精度指标(即BadPix 0.01、BadPix 0.03、BadPix 0.07和MSE×100)上均显著优于当前最先进的光场深度估计算法。所提方法的代码与模型已开源于 \url{https://github.com/chaowentao/SubFocal}。