RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

from arxiv, Haowen Wang and Zhengping Che are with equal contributions. Paper accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856). arXiv admin note: text overlap with arXiv:2203.10856

Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments. For example, transparent materials frequently elude detection by depth sensors; surfaces may introduce measurement inaccuracies due to their polished textures, extended distances, and oblique incidence angles from the sensor. The presence of incomplete depth maps imposes significant challenges for subsequent vision applications, prompting the development of numerous depth completion techniques to mitigate this problem. Numerous methods excel at reconstructing dense depth maps from sparse samples, but they often falter when faced with extensive contiguous regions of missing depth values, a prevalent and critical challenge in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps while ensuring high fidelity through cycle consistency. We fuse the two branches via adaptive fusion modules named W-AdaIN and train the model with the help of pseudo depth maps. Comprehensive evaluations on NYU-Depth V2 and SUN RGB-D datasets show that our method significantly enhances depth completion performance particularly in realistic indoor settings.

翻译：室内场景中捕获的原始深度图像，由于传感器和环境的固有限制，常表现出广泛缺失值。例如，透明材料经常逃避深度传感器的检测；表面可能因其抛光纹理、远距离以及来自传感器的斜入射角度而导致测量不准确。不完整深度图的存在对后续视觉应用构成了重大挑战，促使众多深度补全技术的发展以缓解这一问题。许多方法擅长从稀疏样本重建密集深度图，但当面对大块连续缺失深度值区域时——这是室内环境中普遍且关键的问题——它们往往表现不佳。为克服这些挑战，我们设计了一种新颖的双分支端到端融合网络，名为RDFC-GAN，其以一对RGB图像和不完整深度图像作为输入，预测密集且完整的深度图。第一分支采用编码器-解码器结构，通过遵循曼哈顿世界假设并利用来自RGB-D信息的法线图作为引导，从原始深度图回归局部密集深度值。另一分支应用RGB-深度融合循环生成对抗网络，擅长将RGB图像转换为细节丰富、纹理化的深度图，同时通过循环一致性确保高保真度。我们通过名为W-AdaIN的自适应融合模块融合这两个分支，并借助伪深度图训练模型。在NYU-Depth V2和SUN RGB-D数据集上的全面评估表明，我们的方法显著提升了深度补全性能，尤其是在真实室内场景中。