Geometry-Aware Fisheye-LiDAR Fusion for Robust 3D Object Detection in Low-Overlap Setups

As autonomous systems expand from capital-intensive robotaxis to cost-sensitive logistics, sensor configurations are increasingly optimized for coverage-per-cost. A prevalent sparse-view setup utilizes dual-fisheye cameras with a roof-mounted LiDAR, introducing severe geometric challenges: extreme radial distortion, minimal overlap, and misalignment between spherical projections and rectilinear grids. BEV fusion algorithms typically force image and point cloud modalities into unified Cartesian grids early in the pipeline, causing significant feature distortion and information loss for wide-view fisheye cameras. To address this, we propose a Geometry-Aware Hybrid Fusion (GA-HF) framework that explicitly accounts for fisheye geometry and BEV feature distortion, where fisheye features are lifted into a polar BEV grid via a Distortion-Aware Lift-Splat-Shoot (LSS) module to preserve native angular density, while LiDAR features are processed in native Cartesian space for metric fidelity of bounding box regression. To bridge these heterogeneous streams, we introduce a Dual-Attention Warping Correction module that applies spatial and channel attention to the warped camera features before fusion, explicitly suppressing artifacts in low-quality peripheral regions while enhancing high-quality semantic cues. GA-HF is evaluated on three benchmarks: KITTI-360, Dur360BEV, and Fisheye3DOD datasets. To the best of our knowledge, it is the first approach to explore LiDAR-fisheye camera fusion. On KITTI-360, GA-HF improves NDS by 4.2% over Cartesian baselines; on Dur360BEV, it surpasses both LiDAR-only and BEVFusion, while significantly reducing orientation error despite the geometric distortions; on Fisheye3DOD, it attains the highest detection score among all fusion methods.

翻译：随着自主系统从资本密集型的自动驾驶出租车扩展至成本敏感的物流领域，传感器配置日益优化单位成本的感知覆盖范围。一种常见的稀疏视角配置采用双鱼眼相机与车顶激光雷达，带来了严峻的几何挑战：极端径向畸变、极小重叠区域，以及球面投影与笛卡尔网格之间的错位。传统的BEV融合方法通常在预处理阶段强制将图像与点云模态统一至笛卡尔网格，导致广视角鱼眼相机的特征严重畸变与信息损失。为此，我们提出几何感知混合融合框架（GA-HF），该框架显式考虑鱼眼几何特性与BEV特征畸变：通过畸变感知的Lift-Splat-Shoot（LSS）模块将鱼眼特征提升至极坐标BEV网格以保持原生角密度，同时将激光雷达特征保留在原生笛卡尔空间以确保边界框回归的度量保真度。为桥接异质数据流，我们引入双注意力扭曲校正模块，在融合前对经过扭曲校正的相机特征施加空间与通道注意力，在抑制低质量外围区域伪影的同时增强高质量语义线索。GA-HF在三个基准数据集（KITTI-360、Dur360BEV、Fisheye3DOD）上完成评估。据我们所知，这是首个探索激光雷达-鱼眼相机融合的方法。在KITTI-360上，GA-HF相较笛卡尔基线提升NDS指标4.2%；在Dur360BEV上，其性能超越纯激光雷达方法与BEVFusion，且在几何畸变条件下显著降低朝向误差；在Fisheye3DOD上，该方法在所有融合方法中取得最高检测分数。