The depth completion task is a critical problem in autonomous driving, involving the generation of dense depth maps from sparse depth maps and RGB images. Most existing methods employ a spatial propagation network to iteratively refine the depth map after obtaining an initial dense depth. In this paper, we propose DenseFormer, a novel method that integrates the diffusion model into the depth completion task. By incorporating the denoising mechanism of the diffusion model, DenseFormer generates the dense depth map by progressively refining an initial random depth distribution through multiple iterations. We propose a feature extraction module that leverages a feature pyramid structure, along with multi-layer deformable attention, to effectively extract and integrate features from sparse depth maps and RGB images, which serve as the guiding condition for the diffusion process. Additionally, this paper presents a depth refinement module that applies multi-step iterative refinement across various ranges to the dense depth results generated by the diffusion process. The module utilizes image features enriched with multi-scale information and sparse depth input to further enhance the accuracy of the predicted depth map. Extensive experiments on the KITTI outdoor scene dataset demonstrate that DenseFormer outperforms classical depth completion methods.
翻译:深度补全任务是自动驾驶领域的一个关键问题,涉及从稀疏深度图与RGB图像生成稠密深度图。现有方法大多在获得初始稠密深度后采用空间传播网络迭代优化深度图。本文提出DenseFormer,一种将扩散模型整合到深度补全任务中的新方法。通过结合扩散模型的去噪机制,DenseFormer通过多次迭代逐步优化初始随机深度分布来生成稠密深度图。我们提出一个特征提取模块,利用特征金字塔结构以及多层可变形注意力机制,有效提取并融合来自稀疏深度图与RGB图像的特征,作为扩散过程的引导条件。此外,本文提出一个深度优化模块,对扩散过程生成的稠密深度结果进行多步跨尺度迭代优化。该模块利用富含多尺度信息的图像特征与稀疏深度输入,进一步提升预测深度图的精度。在KITTI室外场景数据集上的大量实验表明,DenseFormer优于经典深度补全方法。