实例引导的雷达深度估计用于三维目标检测 (Instance-Guided Radar Depth Estimation for 3D Object Detection)

Accurate depth estimation is fundamental to 3D perception in autonomous driving, supporting tasks such as detection, tracking, and motion planning. However, monocular camera-based 3D detection suffers from depth ambiguity and reduced robustness under challenging conditions. Radar provides complementary advantages such as resilience to poor lighting and adverse weather, but its sparsity and low resolution limit its direct use in detection frameworks. This motivates the need for effective Radar-camera fusion with improved preprocessing and depth estimation strategies. We propose an end-to-end framework that enhances monocular 3D object detection through two key components. First, we introduce InstaRadar, an instance segmentation-guided expansion method that leverages pre-trained segmentation masks to enhance Radar density and semantic alignment, producing a more structured representation. InstaRadar achieves state-of-the-art results in Radar-guided depth estimation, showing its effectiveness in generating high-quality depth features. Second, we integrate the pre-trained RCDPT into the BEVDepth framework as a replacement for its depth module. With InstaRadar-enhanced inputs, the RCDPT integration consistently improves 3D detection performance. Overall, these components yield steady gains over the baseline BEVDepth model, demonstrating the effectiveness of InstaRadar and the advantage of explicit depth supervision in 3D object detection. Although the framework lags behind Radar-camera fusion models that directly extract BEV features, since Radar serves only as guidance rather than an independent feature stream, this limitation highlights potential for improvement. Future work will extend InstaRadar to point cloud-like representations and integrate a dedicated Radar branch with temporal cues for enhanced BEV fusion.

翻译：精确的深度估计是自动驾驶三维感知的基础，支撑着检测、跟踪与运动规划等任务。然而，基于单目相机的三维检测存在深度模糊性问题，且在挑战性环境下鲁棒性降低。雷达具备光照条件恶劣与天气不佳时仍保持稳健等互补优势，但其稀疏性与低分辨率限制了其在检测框架中的直接应用。这促使我们需要通过改进预处理与深度估计策略来实现有效的雷达-相机融合。我们提出了一种端到端框架，通过两个关键组件增强单目三维目标检测。首先，我们提出InstaRadar，一种实例分割引导的扩展方法，该方法利用预训练的分割掩码来增强雷达点云密度与语义对齐，从而生成更具结构化的表示。InstaRadar在雷达引导的深度估计中取得了最先进的结果，证明了其在生成高质量深度特征方面的有效性。其次，我们将预训练的RCDPT集成到BEVDepth框架中，以替代其原有的深度估计模块。在InstaRadar增强的输入下，RCDPT的集成持续提升了三维检测性能。总体而言，这些组件相较于基线BEVDepth模型实现了稳定的性能提升，证明了InstaRadar的有效性以及在三维目标检测中显式深度监督的优势。尽管该框架目前落后于直接提取BEV特征的雷达-相机融合模型，因为雷达仅作为引导而非独立的特征流，这一局限性也指明了未来的改进潜力。未来工作将把InstaRadar扩展到类点云表示，并集成一个具备时序线索的专用雷达分支，以实现增强的BEV融合。