Radars, due to their robustness to adverse weather conditions and ability to measure object motions, have served in autonomous driving and intelligent agents for years. However, Radar-based perception suffers from its unintuitive sensing data, which lack of semantic and structural information of scenes. To tackle this problem, camera and Radar sensor fusion has been investigated as a trending strategy with low cost, high reliability and strong maintenance. While most recent works explore how to explore Radar point clouds and images, rich contextual information within Radar observation are discarded. In this paper, we propose a hybrid point-wise Radar-Optical fusion approach for object detection in autonomous driving scenarios. The framework benefits from dense contextual information from both the range-doppler spectrum and images which are integrated to learn a multi-modal feature representation. Furthermore, we propose a novel local coordinate formulation, tackling the object detection task in an object-centric coordinate. Extensive results show that with the information gained from optical images, we could achieve leading performance in object detection (97.69\% recall) compared to recent state-of-the-art methods FFT-RadNet (82.86\% recall). Ablation studies verify the key design choices and practicability of our approach given machine generated imperfect detections. The code will be available at https://github.com/LiuLiu-55/ROFusion.
翻译:雷达凭借其对恶劣天气条件的鲁棒性及物体运动测量能力,已在自动驾驶和智能体领域应用多年。然而,基于雷达的感知受限于其不直观的传感数据,缺乏场景的语义与结构信息。为解决该问题,摄像头与雷达传感器融合作为一种低成本、高可靠性和强维护性的策略被广泛研究。现有工作主要探索雷达点云与图像的结合方式,却忽略了雷达观测中丰富的上下文信息。本文提出一种面向自动驾驶场景目标检测的混合逐点雷达-光学融合方法。该框架通过融合来自距离-多普勒频谱与图像的密集上下文信息,学习多模态特征表示。此外,我们提出一种新颖的局部坐标公式,以物体为中心坐标体系处理目标检测任务。大量实验表明,借助光学图像信息,本方法在目标检测任务中(召回率97.69%)相比最新方法FFT-RadNet(召回率82.86%)取得领先性能。消融研究验证了关键设计选择的有效性,以及本方法在机器生成非完美检测结果下的实用性。代码将发布于https://github.com/LiuLiu-55/ROFusion。