Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE). Code: https://github.com/harborsarah/CaFNet
翻译:深度估计在自动驾驶中对于精确解析三维场景至关重要。近年来,得益于雷达的鲁棒性和低成本特性,雷达-相机深度估计已引起充分关注。为此,本文提出了一种用于稠密深度估计的两阶段端到端可训练置信感知融合网络(CaFNet),该网络将RGB图像与稀疏且含噪声的雷达点云数据相结合。第一阶段通过预测雷达置信度图和初步的粗深度图,解决雷达特有的挑战,如高度模糊和含噪声测量。本文提出了一种生成置信度图真值的新方法,该方法通过将每个雷达点与其对应物体关联以识别潜在的投影表面。这些图与初始雷达输入一起由第二编码器处理。在最终深度估计阶段,我们创新性地提出了一种置信感知门控融合机制,以有效整合雷达与图像特征,从而通过滤除雷达噪声提升深度图的可靠性。在nuScenes数据集上的评估表明,我们的方法展现出优越性能,在平均绝对误差(MAE)和均方根误差(RMSE)上分别比当前领先模型提升了3.2%和2.7%。代码:https://github.com/harborsarah/CaFNet