Monocular depth estimation is known as an ill-posed task in which objects in a 2D image usually do not contain sufficient information to predict their depth. Thus, it acts differently from other tasks (e.g., classification and segmentation) in many ways. In this paper, we find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency in the feature representation. But the current backbones borrowed from other tasks pay less attention to handling different types of environmental information, limiting the overall depth accuracy. To bridge this gap, we propose a new Direction-aware Cumulative Convolution Network (DaCCN), which improves the depth feature representation in two aspects. First, we propose a direction-aware module, which can learn to adjust the feature extraction in each direction, facilitating the encoding of different types of information. Secondly, we design a new cumulative convolution to improve the efficiency for aggregating important environmental information. Experiments show that our method achieves significant improvements on three widely used benchmarks, KITTI, Cityscapes, and Make3D, setting a new state-of-the-art performance on the popular benchmarks with all three types of self-supervision.
翻译:单目深度估计被称为一个病态任务,其中二维图像中的物体通常缺乏足够的信息来预测其深度。因此,它在许多方面与其他任务(如分类和分割)表现不同。在本文中,我们发现自监督单目深度估计在特征表示中表现出方向敏感性和环境依赖性。然而,当前从其他任务借鉴的骨干网络较少关注处理不同类型的环境信息,从而限制了整体深度精度。为弥补这一差距,我们提出了一种新的方向感知累积卷积网络(DaCCN),该网络从两个方面改进了深度特征表示。首先,我们提出了一个方向感知模块,该模块能够学习调整每个方向上的特征提取,从而促进不同类型信息的编码。其次,我们设计了一种新的累积卷积,以提高聚合重要环境信息的效率。实验表明,我们的方法在三个广泛使用的基准数据集(KITTI、Cityscapes和Make3D)上取得了显著改进,并在所有三种自监督模式下均达到了流行的基准数据集上最新的最优性能。