RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

from arxiv, Haowen Wang and Zhengping Che are with equal contributions. Under review. An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856)

The raw depth image captured by indoor depth sensors usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and the limited distance range. The incomplete depth map with missing values burdens many downstream vision tasks, and a rising number of depth completion methods have been proposed to alleviate this issue. While most existing methods can generate accurate dense depth maps from sparse and uniformly sampled depth maps, they are not suitable for complementing large contiguous regions of missing depth values, which is common and critical in images captured in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. In the other branch, we propose an RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches, and we append a confidence fusion head to fuse the two outputs of the branches for the final depth map. Extensive experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method clearly improves the depth completion performance, especially in a more realistic setting of indoor environments, with the help of our proposed pseudo depth maps in training.

翻译：室内深度传感器捕获的原始深度图像通常由于固有限制（如无法感知透明物体及距离范围有限）而存在大范围缺失深度值。带有缺失值的不完整深度图为众多下游视觉任务带来负担，为此已有越来越多的深度补全方法被提出以缓解此问题。虽然现有大多数方法能从稀疏且均匀采样的深度图生成精确的稠密深度图，但它们不适用于补全室内环境图像中常见且关键的大面积连续缺失深度值区域。为克服这些挑战，我们设计了一种新颖的双分支端到端融合网络RDFC-GAN，该网络以一对RGB图像和不完整深度图像作为输入，预测稠密且完整的深度图。第一个分支采用编码器-解码器结构，通过遵循曼哈顿世界假设并利用RGB-D信息中的法向量图作为引导，从原始深度图中回归出局部稠密深度值。在另一个分支中，我们提出RGB-深度融合循环生成对抗网络，将RGB图像迁移为具有精细纹理的深度图。我们采用名为W-AdaIN的自适应融合模块跨两个分支传播特征，并附加置信度融合头以融合两个分支的输出生成最终深度图。在NYU-Depth V2和SUN RGB-D上的大量实验表明，借助训练中提出的伪深度图，我们提出的方法显著提升了深度补全性能，尤其是在更真实的室内环境设定下。