Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. Recently, LIDAR-supervised methods have achieved remarkable per-pixel depth accuracy in outdoor scenes. However, significant errors are typically found in the proximity of depth discontinuities, i.e., depth edges, which often hinder the performance of depth-dependent applications that are sensitive to such inaccuracies, e.g., novel view synthesis and augmented reality. Since direct supervision for the location of depth edges is typically unavailable in sparse LIDAR-based scenes, encouraging the MDE model to produce correct depth edges is not straightforward. To the best of our knowledge this paper is the first attempt to address the depth edges issue for LIDAR-supervised scenes. In this work we propose to learn to detect the location of depth edges from densely-supervised synthetic data, and use it to generate supervision for the depth edges in the MDE training. %Despite the 'domain gap' between synthetic and real data, we show that depth edges that are estimated directly are significantly more accurate than the ones that emerge indirectly from the MDE training. To quantitatively evaluate our approach, and due to the lack of depth edges ground truth in LIDAR-based scenes, we manually annotated subsets of the KITTI and the DDAD datasets with depth edges ground truth. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.
翻译:单目深度估计(MDE)是计算机视觉领域的一个基础问题,具有众多应用。近年来,激光雷达监督方法在户外场景中实现了显著的逐像素深度精度。然而,在深度不连续区域(即深度边缘)附近通常存在显著误差,这些误差常阻碍对深度精度敏感的应用(如新颖视角合成和增强现实)的性能。由于在稀疏激光雷达场景中通常缺乏深度边缘位置的直接监督,因此促使MDE模型生成正确的深度边缘并非易事。据我们所知,本文首次尝试解决激光雷达监督场景中的深度边缘问题。在这项工作中,我们提出从密集监督的合成数据中学习检测深度边缘的位置,并将其用于生成MDE训练中深度边缘的监督信号。尽管合成数据与真实数据之间存在“领域差距”,但我们表明直接估计的深度边缘显著优于MDE训练中间接生成的深度边缘。为了定量评估我们的方法,并考虑到激光雷达场景中缺乏深度边缘的真值基准,我们手动标注了KITTI和DDAD数据集的子集作为深度边缘的真值基准。我们在多个具有挑战性的数据集上证明了深度边缘精度的显著提升,同时保持了可比的逐像素深度精度。