Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data

Depth estimation from a single image is an active research topic in computer vision. The most accurate approaches are based on fully supervised learning models, which rely on a large amount of dense and high-resolution (HR) ground-truth depth maps. However, in practice, color images are usually captured with much higher resolution than depth maps, leading to the resolution-mismatched effect. In this paper, we propose a novel weakly-supervised framework to train a monocular depth estimation network to generate HR depth maps with resolution-mismatched supervision, i.e., the inputs are HR color images and the ground-truth are low-resolution (LR) depth maps. The proposed weakly supervised framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation. Specifically, for the monocular depth estimation network the input color image is first downsampled to obtain its LR version with the same resolution as the ground-truth depth. Then, both HR and LR color images are fed into the proposed monocular depth estimation network to obtain the corresponding estimated depth maps. We introduce three losses to train the network: 1) reconstruction loss between the estimated LR depth and the ground-truth LR depth; 2) reconstruction loss between the downsampled estimated HR depth and the ground-truth LR depth; 3) consistency loss between the estimated LR depth and the downsampled estimated HR depth. In addition, we design a depth reconstruction network from depth to depth. Through distillation loss, features between two networks maintain the structural consistency in affinity space, and finally improving the estimation network performance. Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes, and is competitive or even better compared to supervised ones.

翻译：从单一图像的深度估算是计算机视野中一个积极的研究主题。最准确的方法基于完全监督的学习模型, 依赖大量密度和高分辨率(HR)的地面真实深度地图。然而,在实践中, 彩色图像的捕获分辨率通常比深度地图高得多, 从而产生解析- 匹配效应。在本文件中, 我们提出了一个新颖的、薄弱的监管框架, 用于培训单层深度估算网络, 以产生分辨率匹配的分辨率近距离测量深度图, 也就是说, 投入是 HR 颜色图像, 地面真相是低分辨率(LR) 深度深度地图。拟议的低监管框架由共享重量单层深度估算网络和深度重建网络组成。具体来说, 单层深度估算网络的输入颜色图像首先被冲淡, 以获得与地面深度相同的分辨率的LRV版本。然后, HR 和 LR 下层图像被注入了拟议的单层深度估算网络的深度估算网络, 我们从深度深度估算的深度深度深度深度估算中引入了三次损失, 重建成本, 成本成本方法学习。